Overview

Transcribers are used to configure the speech recognition of your application. In this guide, we will show you how to configure and use transcribers in Vocode.

Supported Transcribers

Vocode currently supports the following transcribers:

Deepgram
Google
Assembly AI
Whisper CPP
Rev AI
Azure

These transcribers are defined using their respective configuration classes, which are subclasses of the TranscriberConfig class.

Configuring Transcribers

To use a transcriber, you need to create a configuration object for the transcriber you want to use. Here are some examples of how to create configuration objects for different transcribers:

Example 1: Using Deepgram with a phone call

from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer
from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig

server = InboundCallServer(
    ...
    transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device(
      endpointing_config=DeepgramEndpointingConfig()
    ),
    ...
)

In this example, the DeepgramTranscriberConfig.from_telephone_input_device() method is used to create a configuration object for the Deepgram transcriber. The method hardcodes some values like the sampling_rate, audio_encoding, and chunk_size for compatibility with telephone input devices.

Example 2: Using Deepgram in `StreamingConversation` locally

from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig
from vocode.streaming import StreamingConversation

async def main():
    microphone_input, speaker_output = create_microphone_input_and_speaker_output(
        streaming=True, use_default_devices=False
    )

    conversation = StreamingConversation(
        output_device=speaker_output,
        transcriber=DeepgramTranscriber(
            DeepgramTranscriberConfig.from_input_device(
                microphone_input, endpointing_config=DeepgramEndpointingConfig()
            )
        ),
        ...
    )

In this example, the DeepgramTranscriberConfig.from_input_device() method is used to create a configuration object for the Deepgram transcriber for use in a local StreamingConversation. The method takes a microphone_input object as an argument and extracts the sampling_rate, audio_encoding, and chunk_size from the input device. See Conversation Mechanics for more information about endpointing.

Vocode 101

Getting Started

Agents

Synthesizers (Voice)

Transcribers (Speech-to-Text)

Actions

Conversation Tuning

Monitoring

Testing

Advanced Functionality

Legacy (0.0.111) Guides

Using transcribers

Overview

Supported Transcribers

Configuring Transcribers

Example 1: Using Deepgram with a phone call

Example 2: Using Deepgram in `StreamingConversation` locally

Vocode 101

Getting Started

Agents

Synthesizers (Voice)

Transcribers (Speech-to-Text)

Actions

Conversation Tuning

Monitoring

Testing

Advanced Functionality

Legacy (0.0.111) Guides

​Overview

​Supported Transcribers

​Configuring Transcribers

​Example 1: Using Deepgram with a phone call

​Example 2: Using Deepgram in StreamingConversation locally

Overview

Supported Transcribers

Configuring Transcribers

Example 1: Using Deepgram with a phone call

Example 2: Using Deepgram in `StreamingConversation` locally