To produce sound in the conversation, developers can specify a synthesizer configuration that matches their needs.
SynthesizerConfig class defines several key fields that all implementations
The sampling rate of the audio to be synthesized.
The encoding format of the audio to be synthesized.
Experiment with Azure voices on their web playground
The voice name to be used for synthesis.
The pitch shift to apply to the synthesized audio. Ranges from [-50, 50].
The speaking rate to use for synthesis, specified as a percentage of the default rate (rate=20 means 120% faster than the default)
ElevenLabs is the state-of-the-art TTS/voice cloning API. Note that their API latency is unreliable and traffic based (it can spike from sub-second latency to >5 seconds). Also, it is not compatible with our bash demo (only works on web).
Your API key to use for authentication with the ElevenLabs API.
The ID of the voice to use for synthesis.