Introduction

You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!).

Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.

Setting up the conversation

Start by copying the StreamingConversation quickstart. This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - we’ll be replacing each piece with a corresponding open-source model.

conversation = StreamingConversation(
    output_device=speaker_output,
    transcriber=DeepgramTranscriber(
        DeepgramTranscriberConfig.from_input_device(microphone_input)
    ),
    agent=ChatGPTAgent(
        ChatGPTAgentConfig(
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=AzureSynthesizer(
        AzureSynthesizerConfig.from_output_device(speaker_output)
    ),
    logger=logger,
)

Whisper.cpp

Follow the steps in the whisper.cpp repo to download one of the models.

As of now (2023/05/01), here’s an example flow to do this:

  1. Clone the whisper.cpp repo
  2. From the whisper.cpp directory, run:
bash ./models/download-ggml-model.sh tiny.en
make

Find your (absolute) paths for the whisper.cpp shared library file and the model you’ve just downloaded. If whisper.cpp is downloaded at /whisper.cpp, the paths from the previous example would be:

  • /whisper.cpp/libwhisper.so
  • /whisper.cpp/models/ggml-tiny.bin

Set up your streaming WhisperCPPTranscriber in StreamingConversation as follows:

from vocode.streaming.models.transcriber import WhisperCPPTranscriberConfig
from vocode.streaming.transcriber.whisper_cpp_transcriber import WhisperCPPTranscriber

StreamingConversation(
    ...
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="/whisper.cpp/libwhisper.so",
            fname_model="/whisper.cpp/models/ggml-tiny.bin",
        )
    )
    ...
)

GPT4All

Install the pygpt4all package by running:

pip install pygpt4all

Download the latest GPT4All-J model from the pygpt4all repo.

As of today (2023/05/01), you can download it by visiting: https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

Set up your agent in StreamingConversation as follows:

from vocode.streaming.models.agent import GPT4AllAgentConfig
from vocode.streaming.agent.gpt4all_agent import GPT4AllAgent

StreamingConversation(
    ...
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-gpt4all-j-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    )
    ...
)

Llama.cpp

You can use any model supported by llama.cpp with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use NousResearch/Nous-Hermes-13b in this example because it currently ranks highly on HuggingFace’s Open LLM Leaderboard.

Our implementation is built on top of langchain, which integrates with llama.cpp through llama-cpp-python.

Install llama-cpp-python by running the following:

pip install llama-cpp-python

or run the following to install it with support for offloading model layers to a GPU via CUDA:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python

llama-cpp-python has more installation commands for different BLAS backends.

Set up your agent in StreamingConversation as follows:

from vocode.streaming.models.agent import LlamacppAgentConfig
from vocode.streaming.agent.llamacpp_agent import LlamacppAgent

StreamingConversation(
    ...
    agent=LlamacppAgent(
        LlamacppAgentConfig(
            prompt_preamble="The AI is having a pleasant conversation about life",
            llamacpp_kwargs={"model_path": "path/to/nous-hermes-13b.ggmlv3.q4_0.bin", "verbose": True},
            prompt_template="alpaca",
            initial_message=BaseMessage(text="Hello!"),
        )
    )
    ...
)

You can add the key n_gpu_layers to the llamacpp_kwargs to offload some of the model’s layers to a GPU.

Coqui TTS

Install the Coqui TTS package by running:

pip install TTS

See the Coqui TTS repo for more instructions in case you run into bugs.

Find which OS speech synthesis model you’d like to use. One way to do this is to run:

tts --list_models

For this example, we’ll use Tacotron2.

Set up your synthesizer in StreamingConversation as follows:

from vocode.streaming.models.synthesizer import CoquiTTSSynthesizerConfig,
from vocode.streaming.synthesizer.coqui_tts_synthesizer import CoquiTTSSynthesizer

StreamingConversation(
    ...
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    )
    ...
)

Run the conversation

Putting this all together, our StreamingConversation instance looks like:

StreamingConversation(
    output_device=speaker_output,
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="path/to/whisper.cpp/libwhisper.so",
            fname_model="path/to/whisper.cpp/models/ggml-tiny.bin",
        )
    ),
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    ),
    logger=logger,
)

Start the conversation by running:

python quickstarts/streaming_conversation.py