Overview

Synthesizers are used to convert text into speech; this guide will show you how to configure and use synthesizers in Vocode.

Supported Synthesizers

Vocode currently supports the following synthesizers:

  1. Azure (Microsoft)
  2. Google
  3. Eleven Labs
  4. Rime
  5. Play.ht
  6. GTTS (Google Text-to-Speech)
  7. Stream Elements
  8. Bark
  9. Amazon Polly

These synthesizers are defined using their respective configuration classes, which are subclasses of the SynthesizerConfig class.

Configuring Synthesizers

To use a synthesizer, you need to create a configuration object for the synthesizer you want to use. Here are some examples of how to create configuration objects for different synthesizers:

Example 1: Using Eleven Labs with a phone call

from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer
from vocode.streaming.models.synthesizer import ElevenLabsSynthesizerConfig

server = InboundCallServer(
    ...
    synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        voice_id=os.getenv("YOUR VOICE ID")
    )
    ...
)

In this example, the ElevenLabsSynthesizerConfig.from_telephone_output_device() method is used to create a configuration object for the Eleven Labs synthesizer. The method hardcodes some values like the sampling_rate and audio_encoding for compatibility with telephone output devices.

ElevenLabs Input Streaming

You can try out our experimental implementation of ElevenLabs’ input streaming API by passing in experimental_websocket=True into the config and using the ElevenLabsWSSynthesizer, like:

from vocode.streaming.synthesizer.eleven_labs_websocket_synthesizer import ElevenLabsWSSynthesizer
from vocode.streaming.models.synthesizer import ElevenLabsSynthesizerConfig

...
synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("YOUR VOICE ID"),
    experimental_websocket=True
)
...
synthesizer=ElevenLabsWSSynthesizer(ElevenLabsSynthesizerConfig.from_output_device(
    speaker_output,
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    voice_id=os.getenv("YOUR VOICE ID"),
    experimental_websocket=True
))
...

Play.ht v2

We now support Play.ht’s new gRPC streaming API, which runs much faster than their HTTP API and is designed for realtime communication.

...
synthesizer_config=PlayHtSynthesizerConfig.from_telephone_output_device(
    api_key=os.getenv("PLAY_HT_API_KEY"),
    user_id=os.getenv("PLAY_HT_USER_ID"),
)
...

Example 2: Using Azure in StreamingConversation locally

from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.helpers import create_microphone_input_and_speaker_output

microphone_input, speaker_output = create_microphone_input_and_speaker_output(
        streaming=True, use_default_devices=False
)

conversation = StreamingConversation(
    ...
    synthesizer=AzureSynthesizer(
        AzureSynthesizerConfig.from_output_device(speaker_output)
    ),
    ...
)

In this example, the AzureSynthesizerConfig.from_output_device() method is used to create a configuration object for the Azure synthesizer. The method takes a speaker_output object as an argument, and extracts the sampling_rate and audio_encoding from the output device.