Fully local conversation
💃 Run a voice agent on your computer without Internet
Introduction
You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!).
Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.
Setting up the conversation
Start by copying the StreamingConversation
quickstart.
This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - we’ll be replacing each piece with
a corresponding open-source model.
conversation = StreamingConversation(
output_device=speaker_output,
transcriber=DeepgramTranscriber(
DeepgramTranscriberConfig.from_input_device(microphone_input)
),
agent=ChatGPTAgent(
ChatGPTAgentConfig(
initial_message=BaseMessage(text="Hello!"),
prompt_preamble="The AI is having a pleasant conversation about life"
)
),
synthesizer=AzureSynthesizer(
AzureSynthesizerConfig.from_output_device(speaker_output)
),
logger=logger,
)
Whisper.cpp
Follow the steps in the whisper.cpp repo to download one of the models.
As of now (2023/05/01), here’s an example flow to do this:
- Clone the whisper.cpp repo
- From the whisper.cpp directory, run:
bash ./models/download-ggml-model.sh tiny.en
make
Find your (absolute) paths for the whisper.cpp shared library file and the model you’ve just downloaded.
If whisper.cpp is downloaded at /whisper.cpp
, the paths from the previous example would be:
/whisper.cpp/libwhisper.so
/whisper.cpp/models/ggml-tiny.bin
Set up your streaming WhisperCPPTranscriber
in StreamingConversation
as follows:
from vocode.streaming.models.transcriber import WhisperCPPTranscriberConfig
from vocode.streaming.transcriber.whisper_cpp_transcriber import WhisperCPPTranscriber
StreamingConversation(
...
transcriber=WhisperCPPTranscriber(
WhisperCPPTranscriberConfig.from_input_device(
microphone_input,
libname="/whisper.cpp/libwhisper.so",
fname_model="/whisper.cpp/models/ggml-tiny.bin",
)
)
...
)
GPT4All
Install the pygpt4all
package by running:
pip install pygpt4all
Download the latest GPT4All-J model from the pygpt4all repo.
As of today (2023/05/01), you can download it by visiting: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
Set up your agent in StreamingConversation
as follows:
from vocode.streaming.models.agent import GPT4AllAgentConfig
from vocode.streaming.agent.gpt4all_agent import GPT4AllAgent
StreamingConversation(
...
agent=GPT4AllAgent(
GPT4AllAgentConfig(
model_path="path/to/ggml-gpt4all-j-...-.bin",
initial_message=BaseMessage(text="Hello!"),
prompt_preamble="The AI is having a pleasant conversation about life"
)
)
...
)
Llama.cpp
You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFace’s Open LLM Leaderboard.
Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python.
Install llama-cpp-python
by running the following:
pip install llama-cpp-python
or run the following to install it with support for offloading model layers to a GPU via CUDA:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
llama-cpp-python has more installation commands for different BLAS backends.
Set up your agent in StreamingConversation
as follows:
from vocode.streaming.models.agent import LlamacppAgentConfig
from vocode.streaming.agent.llamacpp_agent import LlamacppAgent
StreamingConversation(
...
agent=LlamacppAgent(
LlamacppAgentConfig(
prompt_preamble="The AI is having a pleasant conversation about life",
llamacpp_kwargs={"model_path": "path/to/nous-hermes-13b.ggmlv3.q4_0.bin", "verbose": True},
prompt_template="alpaca",
initial_message=BaseMessage(text="Hello!"),
)
)
...
)
You can add the key n_gpu_layers
to the llamacpp_kwargs
to offload some of the model’s layers to a GPU.
Coqui TTS
Install the Coqui TTS package by running:
pip install TTS
See the Coqui TTS repo for more instructions in case you run into bugs.
Find which OS speech synthesis model you’d like to use. One way to do this is to run:
tts --list_models
For this example, we’ll use Tacotron2.
Set up your synthesizer in StreamingConversation
as follows:
from vocode.streaming.models.synthesizer import CoquiTTSSynthesizerConfig,
from vocode.streaming.synthesizer.coqui_tts_synthesizer import CoquiTTSSynthesizer
StreamingConversation(
...
synthesizer=CoquiTTSSynthesizer(
CoquiTTSSynthesizerConfig.from_output_device(
speaker_output,
tts_kwargs = {
"model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
}
)
)
...
)
Run the conversation
Putting this all together, our StreamingConversation
instance looks like:
StreamingConversation(
output_device=speaker_output,
transcriber=WhisperCPPTranscriber(
WhisperCPPTranscriberConfig.from_input_device(
microphone_input,
libname="path/to/whisper.cpp/libwhisper.so",
fname_model="path/to/whisper.cpp/models/ggml-tiny.bin",
)
),
agent=GPT4AllAgent(
GPT4AllAgentConfig(
model_path="path/to/ggml-...-.bin",
initial_message=BaseMessage(text="Hello!"),
prompt_preamble="The AI is having a pleasant conversation about life"
)
),
synthesizer=CoquiTTSSynthesizer(
CoquiTTSSynthesizerConfig.from_output_device(
speaker_output,
tts_kwargs = {
"model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
}
)
),
logger=logger,
)
Start the conversation by running:
python quickstarts/streaming_conversation.py