Using synthesizers
How to control the voice of your application.
Overview
Synthesizers are used to convert text into speech; this guide will show you how to configure and use synthesizers in Vocode.
Supported Synthesizers
Vocode currently supports the following synthesizers:
- Azure (Microsoft)
- Eleven Labs
- Rime
- Play.ht
- GTTS (Google Text-to-Speech)
- Stream Elements
- Bark
- Amazon Polly
These synthesizers are defined using their respective configuration classes, which are subclasses of the SynthesizerConfig
class.
Configuring Synthesizers
To use a synthesizer, you need to create a configuration object for the synthesizer you want to use. Here are some examples of how to create configuration objects for different synthesizers:
Example 1: Using Eleven Labs with a phone call
from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer
from vocode.streaming.models.synthesizer import ElevenLabsSynthesizerConfig
server = InboundCallServer(
...
synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("YOUR VOICE ID")
)
...
)
In this example, the ElevenLabsSynthesizerConfig.from_telephone_output_device()
method is used to create a configuration object for the Eleven Labs synthesizer.
The method hardcodes some values like the sampling_rate
and audio_encoding
for compatibility with telephone output devices.
ElevenLabs Input Streaming
You can try out our experimental implementation of ElevenLabs’ input streaming API by passing in experimental_websocket=True
into the config and using the ElevenLabsWSSynthesizer
, like:
from vocode.streaming.synthesizer.eleven_labs_websocket_synthesizer import ElevenLabsWSSynthesizer
from vocode.streaming.models.synthesizer import ElevenLabsSynthesizerConfig
...
synthesizer_config=ElevenLabsSynthesizerConfig.from_telephone_output_device(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("YOUR VOICE ID"),
experimental_websocket=True
)
...
synthesizer=ElevenLabsWSSynthesizer(ElevenLabsSynthesizerConfig.from_output_device(
speaker_output,
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("YOUR VOICE ID"),
experimental_websocket=True
))
...
Play.ht v2
We now support Play.ht’s new gRPC streaming API, which runs much faster than their HTTP API and is designed for realtime communication.
...
synthesizer_config=PlayHtSynthesizerConfig.from_telephone_output_device(
api_key=os.getenv("PLAY_HT_API_KEY"),
user_id=os.getenv("PLAY_HT_USER_ID"),
)
...
Example 2: Using Azure in StreamingConversation locally
from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.helpers import create_microphone_input_and_speaker_output
microphone_input, speaker_output = create_microphone_input_and_speaker_output(
streaming=True, use_default_devices=False
)
conversation = StreamingConversation(
...
synthesizer=AzureSynthesizer(
AzureSynthesizerConfig.from_output_device(speaker_output)
),
...
)
In this example, the AzureSynthesizerConfig.from_output_device()
method is used to create a configuration object for the Azure synthesizer.
The method takes a speaker_output
object as an argument, and extracts the sampling_rate
and audio_encoding
from the output device.