Conversation orchestration as a service
In order to have a back-and-forth conversation, you have to do several things:
- Stream audio/receive audio asynchronously
- Generate responses & understand when to generate responses
- Handle innacuracies and interruptions
And all of this is done via orchestration of:
- Speech Recognition
- AI/NLU Layer
- Speech Synthesis
Vocode conveniently abstracts away much of the complexity while giving developers the flexibility to control every piece of the conversation.
Our core abstraction: the Conversation
Vocode breaks down a Conversation into 5 core pieces:
- Transcriber (used for speech recognition)
- Agent (AI/NLU layer)
- Synthesizer (used for speech synthesis)
- Input Device (microphone for audio in)
- Output Device (speaker for audio out)
In order to run an entire conversation, developers can specify each of these 5 pieces with the various types provided by Vocode.
As an example, there are several
Transcriber options (ex.
GoogleTranscriber) that allow you to specify
which providers you would like to use and their parameters.
After specifying all of the types, Vocode handles everything else necessary to have the conversation.