How it works

Conversation orchestration as a service
Our core abstraction: the Conversation

Conversation orchestration as a service

In order to have a back-and-forth conversation, you have to do several things:

Stream audio/receive audio asynchronously
Generate responses & understand when to generate responses
Handle innacuracies and interruptions

And all of this is done via orchestration of:

Speech Recognition
AI/NLU Layer
Speech Synthesis

Vocode conveniently abstracts away much of the complexity while giving developers the flexibility to control every piece of the conversation.

Our core abstraction: the Conversation

Vocode breaks down a Conversation into 5 core pieces:

Transcriber (used for speech recognition)
Agent (AI/NLU layer)
Synthesizer (used for speech synthesis)
Input Device (microphone for audio in)
Output Device (speaker for audio out)

In order to run an entire conversation, developers can specify each of these 5 pieces with the various types provided by Vocode. As an example, there are several Transcriber options (ex. DeepgramTranscriber, AssemblyAITranscriber, GoogleTranscriber) that allow you to specify which providers you would like to use and their parameters. After specifying all of the types, Vocode handles everything else necessary to have the conversation.

What is Vocode How to Use It

⌘I

Vocode 101

Getting Started

Agents

Synthesizers (Voice)

Transcribers (Speech-to-Text)

Actions

Conversation Tuning

Monitoring

Testing

Advanced Functionality

Legacy (0.0.111) Guides

Conversation orchestration as a service

Our core abstraction: the Conversation

Vocode 101

Getting Started

Agents

Synthesizers (Voice)

Transcribers (Speech-to-Text)

Actions

Conversation Tuning

Monitoring

Testing

Advanced Functionality

Legacy (0.0.111) Guides

​Conversation orchestration as a service

​Our core abstraction: the Conversation

Conversation orchestration as a service

Our core abstraction: the Conversation