Overview

To produce sound in the conversation, developers can specify a synthesizer configuration that matches their needs.

SynthesizerConfig Class

The base SynthesizerConfig class defines several key fields that all implementations use.

sampling_rate
int
required

The sampling rate of the audio to be synthesized.

audio_encoding
AudioEncoding
required

The encoding format of the audio to be synthesized.

Synthesizer Implementations

AzureSynthesizerConfig

Experiment with Azure voices on their web playground

voice_name
Optional[str]

The voice name to be used for synthesis.

pitch
Optional[int]

The pitch shift to apply to the synthesized audio. Ranges from [-50, 50].

rate
Optional[int]

The speaking rate to use for synthesis, specified as a percentage of the default rate (rate=20 means 120% faster than the default)

ElevenLabsSynthesizerConfig

ElevenLabs is the state-of-the-art TTS/voice cloning API. Note that their API latency is unreliable and traffic based (it can spike from sub-second latency to >5 seconds). Also, it is not compatible with our bash demo (only works on web).

api_key
str
required

Your API key to use for authentication with the ElevenLabs API.

voice_id
Optional[str]

The ID of the voice to use for synthesis.