Introduction
You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!). Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.Setting up the conversation
Start by copying theStreamingConversation
quickstart.
This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - we’ll be replacing each piece with
a corresponding open-source model.
Whisper.cpp
Follow the steps in the whisper.cpp repo to download one of the models. As of now (2023/05/01), here’s an example flow to do this:- Clone the whisper.cpp repo
- From the whisper.cpp directory, run:
/whisper.cpp
, the paths from the previous example would be:
/whisper.cpp/libwhisper.so
/whisper.cpp/models/ggml-tiny.bin
WhisperCPPTranscriber
in StreamingConversation
as follows:
GPT4All
Install thepygpt4all
package by running:
StreamingConversation
as follows:
Llama.cpp
You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFace’s Open LLM Leaderboard. Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python. Installllama-cpp-python
by running the following:
StreamingConversation
as follows:
n_gpu_layers
to the llamacpp_kwargs
to offload some of the model’s layers to a GPU.
Coqui TTS
Install the Coqui TTS package by running:StreamingConversation
as follows:
Run the conversation
Putting this all together, ourStreamingConversation
instance looks like: