Synthesizer
Used to bring text-to-speech from a variety of providers.
Overview
To produce sound in the conversation, developers can specify a synthesizer configuration that matches their needs.
SynthesizerConfig Class
The base SynthesizerConfig
class defines several key fields that all implementations
use.
The sampling rate of the audio to be synthesized.
The encoding format of the audio to be synthesized.
Synthesizer Implementations
AzureSynthesizerConfig
Experiment with Azure voices on their web playground
The voice name to be used for synthesis.
The pitch shift to apply to the synthesized audio. Ranges from [-50, 50].
The speaking rate to use for synthesis, specified as a percentage of the default rate (rate=20 means 120% faster than the default)
ElevenLabsSynthesizerConfig
ElevenLabs is the state-of-the-art TTS/voice cloning API. Note that their API latency is unreliable and traffic based (it can spike from sub-second latency to >5 seconds). Also, it is not compatible with our bash demo (only works on web).
Your API key to use for authentication with the ElevenLabs API.
The ID of the voice to use for synthesis.