Tracing
[Beta] Benchmarking script
The benchmarking script is located at playground/streaming/benchmark.py
. You can execute the benchmarking script using the CLI which will enable you to evaluate and compare
transcribers, agents, and synthesizers. You can use it primarily to benchmark latency – but it can also be used to compare the quality of the different providers as well. The
feature is in Beta and will continue to be improved upon – feel free to open an issue with any ideas.
Using the CLI
To access the options of the benchmarking script, run
python playground/streaming/benchmark.py --help
This will display all available options.
To conduct multiple trials and get averaged results, you can control num_cycles
--{transcriber,agent,synthesizer}_num_cycles 3 # component specific
--all_num_cycles 3 # all components
To perform a comprehensive test across all supported transcribers, agents, and synthesizers, use the --all
command.
With the CLI, you can get the raw output, write them to a file, and create graphs.
To access your results and visualize them, they will be stored in the benchmark_results
directory by default. You can also change this location using the --results_dir
and --results_file
options. If you want to create visual graphs, add the --create_graphs
option when running your test.
Example: comparing synthesizers
To compare different synthesizers, use the --synthesizers
flag followed by the names of the synthesizers you wish to compare. For instance,
python playground/streaming/benchmark.py --synthesizers Google Azure --synthesizer_text "Your text here"
Example: comparing transcribers
To compare different transcribers, you can use the --transcribers
flag followed by the names of the transcribers you wish to compare. For example,
python playground/streaming/benchmark.py --transcribers deepgram assemblyai --transcriber_audio sample.wav
You can specify transcriber_use_mic
instead of --transcriber_audio
to use your microphone as the audio source.
Example: comparing agents
To compare different agents, use the --agents
flag followed by the names of the agents you want to compare. For example,
python playground/streaming/benchmark.py --agents openai anthropic
You can set the prompt preamble with the --agent_prompt_preamble
argument and the first input with the --agent_first_input
option.
Tracing your application
At the top of quickstarts/streaming_conversation.py
, include the following code:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor, SpanExporter
from opentelemetry.sdk.resources import Resource
class PrintDurationSpanExporter(SpanExporter):
def __init__(self):
super().__init__()
self.spans = defaultdict(list)
def export(self, spans):
for span in spans:
duration_ns = span.end_time - span.start_time
duration_s = duration_ns / 1e9
self.spans[span.name].append(duration_s)
def shutdown(self):
for name, durations in self.spans.items():
print(f"{name}: {sum(durations) / len(durations)}")
trace.set_tracer_provider(TracerProvider(resource=Resource.create({})))
trace.get_tracer_provider().add_span_processor(
SimpleSpanProcessor(PrintDurationSpanExporter())
)
This will print out stats about the conversation after it ends.