TranscriberConfig

sampling_rate
int

The sampling rate of the audio in samples per second (Hz). A higher sampling rate provides better audio quality but may increase processing time and data size.

audio_encoding
AudioEncoding

The encoding format of the audio data. Options include: LINEAR16, MULAW.

chunk_size
int

The size of each chunk of audio data sent to the transcriber, in bytes. A larger chunk size can reduce network overhead but may increase latency.

endpointing_config
Optional[EndpointingConfig]

Optional endpointing configuration to determine when a transcription segment should end. If not provided, default endpointing behavior will be used.

downsampling
Optional[int]

Optional downsampling factor to reduce the sampling rate of the audio before sending to the transcriber. Can be used to reduce bandwidth usage.

min_interrupt_confidence
Optional[float]

Optional minimum confidence threshold for interrupting the transcription. Confidence values range from 0 to 1, with higher values indicating greater confidence. If provided, transcriptions will only be interrupted when the confidence exceeds the threshold. If not provided, the default interrupting behavior will be used.

mute_during_speech
bool

If true, silence audio chunks will be sent to the transcriber while a transcription is in progress. Can be used to prevent echo during live transcription.

DeepgramTranscriberConfig

language
Optional[str]

Optional language code to use for transcription.

model
Optional[str]

Optional Deepgram model to use for transcription. Defaults to “nova”.

tier
Optional[str]

Optional Deepgram tier to use for transcription.

version
Optional[str]

Optional Deepgram version to use for transcription. Defaults to latest version if not provided.

keywords
Optional[List[str]]

Optional list of keywords to boost in the transcription results.

GoogleTranscriberConfig

model
Optional[str]

Optional Google Cloud Speech model to use for transcription.

language_code
str

Language code to use for transcription. Defaults to “en-US”.

AssemblyAITranscriberConfig

buffer_size_seconds
float

Buffer duration in seconds to accumulate audio before sending to AssemblyAI. Defaults to 0.1s.

word_boost
Optional[List[str]]

Optional list of words to boost in the transcription results.

WhisperCPPTranscriberConfig

buffer_size_seconds
float

Buffer duration in seconds to accumulate audio before sending to WhisperCPP. Defaults to 1.0s.

libname
str

Filename of the WhisperCPP shared library.

fname_model
str

Filename of the WhisperCPP model.

AzureTranscriberConfig

language
str

Language code to use for transcription. Defaults to “en-US”.

candidate_languages
Optional[List[str]]

Optional list of candidate languages to auto-detect from the audio.

GladiaTranscriberConfig

buffer_size_seconds
float

Buffer duration in seconds to accumulate audio before sending to Gladia. Defaults to 0.1s.