Using transcribers
How to configure and use transcribers for speech recognition in your application.
Overview
Transcribers are used to configure the speech recognition of your application. In this guide, we will show you how to configure and use transcribers in Vocode.
Supported Transcribers
Vocode currently supports the following transcribers:
- Deepgram
- Assembly AI
- Whisper CPP
- Rev AI
- Azure
These transcribers are defined using their respective configuration classes, which are subclasses of the TranscriberConfig
class.
Configuring Transcribers
To use a transcriber, you need to create a configuration object for the transcriber you want to use. Here are some examples of how to create configuration objects for different transcribers:
Example 1: Using Deepgram with a phone call
from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer
from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig
server = InboundCallServer(
...
transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device(
endpointing_config=DeepgramEndpointingConfig()
),
...
)
In this example, the DeepgramTranscriberConfig.from_telephone_input_device()
method is used to create a configuration object for the Deepgram transcriber. The method hardcodes some values like the sampling_rate
, audio_encoding
, and chunk_size
for compatibility with telephone input devices.
Example 2: Using Deepgram in StreamingConversation
locally
from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig
from vocode.streaming import StreamingConversation
async def main():
microphone_input, speaker_output = create_microphone_input_and_speaker_output(
streaming=True, use_default_devices=False
)
conversation = StreamingConversation(
output_device=speaker_output,
transcriber=DeepgramTranscriber(
DeepgramTranscriberConfig.from_input_device(
microphone_input, endpointing_config=DeepgramEndpointingConfig()
)
),
...
)
In this example, the DeepgramTranscriberConfig.from_input_device()
method is used to create a configuration object for the Deepgram transcriber for use in a local StreamingConversation
.
The method takes a microphone_input
object as an argument and extracts the sampling_rate
, audio_encoding
, and chunk_size
from the input device.
See Conversation Mechanics for more information about endpointing.