Benchmark Guide
This directory contains reproducible benchmark runners. Each script does one job:
dsp_benchmark_suite.py: general DSP throughput, latency, and memory.compare_fft_benchmarks.py: staged FFT comparison between PYTHUSA and joblib.rocketdata_test.py: sample-rate and window-size driven detection-latency budgeting.numba_candidate_benchmark.py: backend experiment for user-side loop acceleration.
Canonical Commands
Use these commands for repeatable release and README measurements:
python benchmarks/dsp_benchmark_suite.py --balanced --json-out benchmarks/results/dsp-balanced.json
python benchmarks/dsp_benchmark_suite.py --latency-min --kernels rfft,power_spectrum,stft --json-out benchmarks/results/dsp-latency-fft.json
python benchmarks/dsp_benchmark_suite.py --balanced --graph --graph-out benchmarks/results/dsp-balanced-heatmaps.png --no-show
python benchmarks/compare_fft_benchmarks.py --json-out benchmarks/results/fft-compare.json
python benchmarks/rocketdata_test.py --json-out benchmarks/results/rocket-latency.json
Structured Output
The main benchmark runners support:
--json: print structured JSON to stdout instead of the text table.--json-out PATH: write structured JSON to a file.--label NAME: attach a run label to the structured output.--graph: render DSP suite heatmaps after the benchmark completes.--graph-out PATH: save those heatmaps to an image file.--no-show: build graph output without opening a matplotlib window.
The recommended result filename pattern is:
benchmarks/results/<benchmark>-<profile-or-focus>.json
Examples:
benchmarks/results/dsp-balanced.jsonbenchmarks/results/dsp-latency-fft.jsonbenchmarks/results/rocket-latency.json
Showcase Demo Benchmarks
The two showcase demos in examples/ double as end-to-end benchmarks for the full PYTHUSA stack. See the Showcase Demos page for full results, architecture walkthroughs, and run commands.
Measurement Notes
- DSP suite latency fields are processing-side latencies from "frame ready" to "consumer finished".
- Rocket benchmark
total_*_detection_msfields add window fill time to that processing latency. task_rss_mbis summed worker RSS and can overcount shared-memory mappings.- Benchmark outputs are machine-dependent; compare like-for-like configurations.