The sampling rate of the audio in samples per second (Hz). A higher sampling rate provides better audio quality but may increase processing time and data size.
The encoding format of the audio data. Options include: LINEAR16, MULAW.
The size of each chunk of audio data sent to the transcriber, in bytes. A larger chunk size can reduce network overhead but may increase latency.
Optional configuration for endpointing, which determines when to split the transcript based on criteria such as time or punctuation. If not provided, the default endpointing behavior will be used.
Optional minimum confidence threshold for interrupting the transcription. Confidence values range from 0 to 1, with higher values indicating greater confidence. If provided, transcriptions will only be interrupted when the confidence exceeds the threshold. If not provided, the default interrupting behavior will be used.
TranscriberConfig class provides helper methods to abstract away some fields to the user.
For example, when using an input device like
MicrophoneInput, you can use:
You can also do this for telephone calls, which all share the same sampling rate, chunk size and audio encoding:
The language code for the transcription, e.g., ‘en-US’ for American English. If not provided, the default language will be used.
The model used for transcription. For Deepgram, it can be ‘phonecall’, ‘voicemail’.
The tier of the Deepgram API to use for the transcription, e.g., ‘enhanced’. If not provided, the default tier will be used.
The version of the Deepgram API to use for the transcription. If not provided, the latest version will be used.
A list of keywords to be used for keyword spotting during transcription with the Deepgram API. If not provided, no keyword spotting will be performed.
The language code for the transcription, e.g., ‘en-US’ for American English. Defaults to ‘en-US’.
The model used for transcription. For Google, it can be ‘default’, ‘video’, or ‘phone_call’. If not provided, the default model will be used.