Conversation orchestration as a service
In order to have a back-and-forth conversation, you have to do several things:- Stream audio/receive audio asynchronously
- Generate responses & understand when to generate responses
- Handle innacuracies and interruptions
- Speech Recognition
- AI/NLU Layer
- Speech Synthesis
Our core abstraction: the Conversation
Vocode breaks down a Conversation into 5 core pieces:- Transcriber (used for speech recognition)
- Agent (AI/NLU layer)
- Synthesizer (used for speech synthesis)
- Input Device (microphone for audio in)
- Output Device (speaker for audio out)
Transcriber
options (ex.
DeepgramTranscriber
, AssemblyAITranscriber
, GoogleTranscriber
) that allow you to specify
which providers you would like to use and their parameters.
After specifying all of the types, Vocode handles everything else necessary
to have the conversation.