Transcribers are used to configure the speech recognition of your application. In this guide, we will show you how to configure and use transcribers in Vocode.
Vocode currently supports the following transcribers:
- Assembly AI
- Whisper CPP
- Rev AI
These transcribers are defined using their respective configuration classes, which are subclasses of the
To use a transcriber, you need to create a configuration object for the transcriber you want to use. Here are some examples of how to create configuration objects for different transcribers:
Example 1: Using Deepgram with a phone call
from vocode.streaming.telephony.hosted.inbound_call_server import InboundCallServer from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig server = InboundCallServer( ... transcriber_config=DeepgramTranscriberConfig.from_telephone_input_device( endpointing_config=PunctuationEndpointingConfig() ), ... )
In this example, the
DeepgramTranscriberConfig.from_telephone_input_device() method is used to create a configuration object for the Deepgram transcriber. The method hardcodes some values like the
chunk_size for compatibility with telephone input devices.
Example 2: Using Deepgram in
from vocode.streaming.models.transcriber import DeepgramTranscriberConfig, PunctuationEndpointingConfig from vocode.streaming import StreamingConversation async def main(): microphone_input, speaker_output = create_microphone_input_and_speaker_output( streaming=True, use_default_devices=False ) conversation = StreamingConversation( output_device=speaker_output, transcriber=DeepgramTranscriber( DeepgramTranscriberConfig.from_input_device( microphone_input, endpointing_config=PunctuationEndpointingConfig() ) ), ... )
In this example, the
DeepgramTranscriberConfig.from_input_device() method is used to create a configuration object for the Deepgram transcriber for use in a local
The method takes a
microphone_input object as an argument and extracts the
chunk_size from the input device.
Endpointing is the process of understanding when someone has finished speaking. The
EndpointingConfig controls how this is done. There are a couple of different ways to configure endpointing:
- Time-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence.
- Punctuation-based endpointing: This method considers the speaker to be finished when there is a certain duration of silence after a punctuation mark.
In the first example, the
PunctuationEndpointingConfig is used to configure the Deepgram transcriber for punctuation-based endpointing.