Turn-based conversation
How to use Vocode in non-streaming applications
Overview
A turn-based conversation is a communication system designed for applications where the user utters a single statement, and the agent is expected to respond fully. This model differs from streaming conversations that try to mimic natural human discourse. Instead, it fits applications triggered by some kind of user input. For example, consider a voice memo application where the user records a message, and the agent generates a complete response.
A turn-based conversation system is perfect for applications that don’t require interruptions and have a controlled conversation flow. Each user input is treated as a discrete event, giving the system time to generate and deliver a full and meaningful response.
Turn-based quickstart
The example below demonstrates a turn-based conversation, using a ChatGPT agent for text generation, WhisperTranscriber for speech-to-text, and AzureSynthesizer for text-to-speech. User interactions trigger the beginning and end of the recording, signaling the system when to listen and when to respond. You can run it with
Remember to replace OPENAI_API_KEY and AZURE_SPEECH_KEY with your actual API keys and set the appropriate Azure region. You can also set these variables in a .env
file and source it in your terminal.
You can also customize the voice, system prompt, and initial message as needed. The code can be found here.