> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vocode.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Fully local conversation

> 💃 Run a voice agent on your computer without Internet

# Introduction

You can use Vocode to interact with open-source transcription, large language, and synthesis models. Many of these models
have been optimized to run on CPU, which means that you can have a conversation with an AI locally without Internet (and thus for free!).

Disclaimer: Many of these models are optimized for Apple Silicon, so this may work best on a M1 or M2 Mac computer.

# Setting up the conversation

Start by copying the [`StreamingConversation`](https://github.com/vocodedev/vocode-python/blob/main/quickstarts/streaming_conversation.py) quickstart.
This example uses Deepgram for transcription, ChatGPT for LLM, and Azure for synthesis - we'll be replacing each piece with
a corresponding open-source model.

```python theme={null}
conversation = StreamingConversation(
    output_device=speaker_output,
    transcriber=DeepgramTranscriber(
        DeepgramTranscriberConfig.from_input_device(microphone_input)
    ),
    agent=ChatGPTAgent(
        ChatGPTAgentConfig(
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=AzureSynthesizer(
        AzureSynthesizerConfig.from_output_device(speaker_output)
    ),
    logger=logger,
)
```

## Whisper.cpp

Follow the steps in the [whisper.cpp repo](https://github.com/ggerganov/whisper.cpp#quick-start) to download one of the models.

As of now (2023/05/01), here's an example flow to do this:

1. Clone the whisper.cpp repo
2. From the whisper.cpp directory, run:

```
bash ./models/download-ggml-model.sh tiny.en
make
```

Find your (absolute) paths for the whisper.cpp shared library file and the model you've just downloaded.
If whisper.cpp is downloaded at `/whisper.cpp`, the paths from the previous example would be:

* `/whisper.cpp/libwhisper.so`
* `/whisper.cpp/models/ggml-tiny.bin`

Set up your streaming `WhisperCPPTranscriber` in `StreamingConversation` as follows:

```python theme={null}
from vocode.streaming.models.transcriber import WhisperCPPTranscriberConfig
from vocode.streaming.transcriber.whisper_cpp_transcriber import WhisperCPPTranscriber

StreamingConversation(
    ...
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="/whisper.cpp/libwhisper.so",
            fname_model="/whisper.cpp/models/ggml-tiny.bin",
        )
    )
    ...
)
```

## GPT4All

Install the `pygpt4all` package by running:

```
pip install pygpt4all
```

Download the latest GPT4All-J model from the [pygpt4all repo](https://github.com/nomic-ai/pygpt4all#gpt4all-j-model).

As of today (2023/05/01), you can download it by visiting: [https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin](https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin)

Set up your agent in `StreamingConversation` as follows:

```python theme={null}
from vocode.streaming.models.agent import GPT4AllAgentConfig
from vocode.streaming.agent.gpt4all_agent import GPT4AllAgent

StreamingConversation(
    ...
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-gpt4all-j-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    )
    ...
)
```

## Llama.cpp

You can use any model supported by [llama.cpp](https://github.com/ggerganov/llama.cpp) with Vocode. This includes LLaMA, Alpaca, Vicuna, Koala, WizardLM, and more. We will use [NousResearch/Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b) in this example because it currently ranks highly on HuggingFace's [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).

Our implementation is built on top of [langchain](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/llamacpp), which integrates with llama.cpp through [llama-cpp-python](https://github.com/abetlen/llama-cpp-python).

Install `llama-cpp-python` by running the following:

```
pip install llama-cpp-python
```

or run the following to install it with support for offloading model layers to a GPU via CUDA:

```
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
```

[llama-cpp-python](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal) has more installation commands for different BLAS backends.

Set up your agent in `StreamingConversation` as follows:

```python theme={null}
from vocode.streaming.models.agent import LlamacppAgentConfig
from vocode.streaming.agent.llamacpp_agent import LlamacppAgent

StreamingConversation(
    ...
    agent=LlamacppAgent(
        LlamacppAgentConfig(
            prompt_preamble="The AI is having a pleasant conversation about life",
            llamacpp_kwargs={"model_path": "path/to/nous-hermes-13b.ggmlv3.q4_0.bin", "verbose": True},
            prompt_template="alpaca",
            initial_message=BaseMessage(text="Hello!"),
        )
    )
    ...
)
```

You can add the key `n_gpu_layers` to the `llamacpp_kwargs` to offload some of the model's layers to a GPU.

## Coqui TTS

Install the Coqui TTS package by running:

```
pip install TTS
```

See the [Coqui TTS repo](https://github.com/coqui-ai/TTS#install-tts) for more instructions in case you run into bugs.

Find which OS speech synthesis model you'd like to use. One way to do this is to run:

```
tts --list_models
```

For this example, we'll use [Tacotron2](https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html).

Set up your synthesizer in `StreamingConversation` as follows:

```python theme={null}
from vocode.streaming.models.synthesizer import CoquiTTSSynthesizerConfig,
from vocode.streaming.synthesizer.coqui_tts_synthesizer import CoquiTTSSynthesizer

StreamingConversation(
    ...
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    )
    ...
)
```

# Run the conversation

Putting this all together, our `StreamingConversation` instance looks like:

```python theme={null}
StreamingConversation(
    output_device=speaker_output,
    transcriber=WhisperCPPTranscriber(
        WhisperCPPTranscriberConfig.from_input_device(
            microphone_input,
            libname="path/to/whisper.cpp/libwhisper.so",
            fname_model="path/to/whisper.cpp/models/ggml-tiny.bin",
        )
    ),
    agent=GPT4AllAgent(
        GPT4AllAgentConfig(
            model_path="path/to/ggml-...-.bin",
            initial_message=BaseMessage(text="Hello!"),
            prompt_preamble="The AI is having a pleasant conversation about life"
        )
    ),
    synthesizer=CoquiTTSSynthesizer(
        CoquiTTSSynthesizerConfig.from_output_device(
            speaker_output,
            tts_kwargs = {
                "model_name": "tts_models/en/ljspeech/tacotron2-DDC_ph"
            }
        )
    ),
    logger=logger,
)
```

Start the conversation by running:

```
python quickstarts/streaming_conversation.py
```
