π Run a voice agent on your computer without Internet
You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!).
Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.
Start by copying the StreamingConversation
quickstart.
This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - weβll be replacing each piece with
a corresponding open-source model.
Follow the steps in the whisper.cpp repo to download one of the models.
As of now (2023/05/01), hereβs an example flow to do this:
Find your (absolute) paths for the whisper.cpp shared library file and the model youβve just downloaded.
If whisper.cpp is downloaded at /whisper.cpp
, the paths from the previous example would be:
/whisper.cpp/libwhisper.so
/whisper.cpp/models/ggml-tiny.bin
Set up your streaming WhisperCPPTranscriber
in StreamingConversation
as follows:
Install the pygpt4all
package by running:
Download the latest GPT4All-J model from the pygpt4all repo.
As of today (2023/05/01), you can download it by visiting: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
Set up your agent in StreamingConversation
as follows:
You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFaceβs Open LLM Leaderboard.
Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python.
Install llama-cpp-python
by running the following:
or run the following to install it with support for offloading model layers to a GPU via CUDA:
llama-cpp-python has more installation commands for different BLAS backends.
Set up your agent in StreamingConversation
as follows:
You can add the key n_gpu_layers
to the llamacpp_kwargs
to offload some of the modelβs layers to a GPU.
Install the Coqui TTS package by running:
See the Coqui TTS repo for more instructions in case you run into bugs.
Find which OS speech synthesis model youβd like to use. One way to do this is to run:
For this example, weβll use Tacotron2.
Set up your synthesizer in StreamingConversation
as follows:
Putting this all together, our StreamingConversation
instance looks like:
Start the conversation by running:
π Run a voice agent on your computer without Internet
You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!).
Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.
Start by copying the StreamingConversation
quickstart.
This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - weβll be replacing each piece with
a corresponding open-source model.
Follow the steps in the whisper.cpp repo to download one of the models.
As of now (2023/05/01), hereβs an example flow to do this:
Find your (absolute) paths for the whisper.cpp shared library file and the model youβve just downloaded.
If whisper.cpp is downloaded at /whisper.cpp
, the paths from the previous example would be:
/whisper.cpp/libwhisper.so
/whisper.cpp/models/ggml-tiny.bin
Set up your streaming WhisperCPPTranscriber
in StreamingConversation
as follows:
Install the pygpt4all
package by running:
Download the latest GPT4All-J model from the pygpt4all repo.
As of today (2023/05/01), you can download it by visiting: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin
Set up your agent in StreamingConversation
as follows:
You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFaceβs Open LLM Leaderboard.
Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python.
Install llama-cpp-python
by running the following:
or run the following to install it with support for offloading model layers to a GPU via CUDA:
llama-cpp-python has more installation commands for different BLAS backends.
Set up your agent in StreamingConversation
as follows:
You can add the key n_gpu_layers
to the llamacpp_kwargs
to offload some of the modelβs layers to a GPU.
Install the Coqui TTS package by running:
See the Coqui TTS repo for more instructions in case you run into bugs.
Find which OS speech synthesis model youβd like to use. One way to do this is to run:
For this example, weβll use Tacotron2.
Set up your synthesizer in StreamingConversation
as follows:
Putting this all together, our StreamingConversation
instance looks like:
Start the conversation by running: