Overview

To produce sound in the conversation, developers can specify a synthesizer configuration that matches their needs.

SynthesizerConfig Class

The base SynthesizerConfig class defines several key fields that all implementations use.
sampling_rate
int
required
The sampling rate of the audio to be synthesized.
audio_encoding
AudioEncoding
required
The encoding format of the audio to be synthesized.

Synthesizer Implementations

AzureSynthesizerConfig

Experiment with Azure voices on their web playground
voice_name
Optional[str]
The voice name to be used for synthesis.
pitch
Optional[int]
The pitch shift to apply to the synthesized audio. Ranges from [-50, 50].
rate
Optional[int]
The speaking rate to use for synthesis, specified as a percentage of the default rate (rate=20 means 120% faster than the default)

ElevenLabsSynthesizerConfig

ElevenLabs is the state-of-the-art TTS/voice cloning API. Note that their API latency is unreliable and traffic based (it can spike from sub-second latency to >5 seconds). Also, it is not compatible with our bash demo (only works on web).
api_key
str
required
Your API key to use for authentication with the ElevenLabs API.
voice_id
Optional[str]
The ID of the voice to use for synthesis.