TranscriberConfig

sampling_rate
int
The sampling rate of the audio in samples per second (Hz). A higher sampling rate provides better audio quality but may increase processing time and data size.
audio_encoding
AudioEncoding
The encoding format of the audio data. Options include: LINEAR16, MULAW.
chunk_size
int
The size of each chunk of audio data sent to the transcriber, in bytes. A larger chunk size can reduce network overhead but may increase latency.
endpointing_config
Optional[EndpointingConfig]
Optional endpointing configuration to determine when a transcription segment should end. If not provided, default endpointing behavior will be used.
downsampling
Optional[int]
Optional downsampling factor to reduce the sampling rate of the audio before sending to the transcriber. Can be used to reduce bandwidth usage.
min_interrupt_confidence
Optional[float]
Optional minimum confidence threshold for interrupting the transcription. Confidence values range from 0 to 1, with higher values indicating greater confidence. If provided, transcriptions will only be interrupted when the confidence exceeds the threshold. If not provided, the default interrupting behavior will be used.
mute_during_speech
bool
If true, silence audio chunks will be sent to the transcriber while a transcription is in progress. Can be used to prevent echo during live transcription.

DeepgramTranscriberConfig

language
Optional[str]
Optional language code to use for transcription.
model
Optional[str]
Optional Deepgram model to use for transcription. Defaults to “nova”.
tier
Optional[str]
Optional Deepgram tier to use for transcription.
version
Optional[str]
Optional Deepgram version to use for transcription. Defaults to latest version if not provided.
keywords
Optional[List[str]]
Optional list of keywords to boost in the transcription results.

GoogleTranscriberConfig

model
Optional[str]
Optional Google Cloud Speech model to use for transcription.
language_code
str
Language code to use for transcription. Defaults to “en-US”.

AssemblyAITranscriberConfig

buffer_size_seconds
float
Buffer duration in seconds to accumulate audio before sending to AssemblyAI. Defaults to 0.1s.
word_boost
Optional[List[str]]
Optional list of words to boost in the transcription results.

WhisperCPPTranscriberConfig

buffer_size_seconds
float
Buffer duration in seconds to accumulate audio before sending to WhisperCPP. Defaults to 1.0s.
libname
str
Filename of the WhisperCPP shared library.
fname_model
str
Filename of the WhisperCPP model.

AzureTranscriberConfig

language
str
Language code to use for transcription. Defaults to “en-US”.
candidate_languages
Optional[List[str]]
Optional list of candidate languages to auto-detect from the audio.

GladiaTranscriberConfig

buffer_size_seconds
float
Buffer duration in seconds to accumulate audio before sending to Gladia. Defaults to 0.1s.