Benchmarks · Independent

The end-to-end latency benchmark for agents.

Most published inference benchmarks measure single-call performance. We measure what actually predicts agent UX: TTFT, sustained throughput, and end-to-end task latency under concurrency.

3–5×
Lower TTFT vs. major providers
3.0×
Faster e2e p95 on agent tasks
↓ 60%
Cost per 1k completed tasks
2026 Q1
Last refresh — re-run quarterly
Headline results

Synaptix vs. major providers.

Concurrency: 16. Hardware: provider default. Models: where unavailable on a provider, the cell shows "—". All numbers are medians across 100 runs from us-east. Reproduce with the harness in our methodology pack.

ModelMetricSynaptixBedrockVertexTogetherFireworks
gpt-oss-120BTTFT (ms)182318291
gpt-oss-120BTokens/sec (single)412248276
Llama 4 405BTTFT (ms)240612598445402
Llama 4 405Bp95 e2e task (s)3.19.48.85.75.2
Kimi-K2.5 (long ctx 1M)TTFT (ms)388972
Qwen3-CoderTokens/sec (single)528342381
Mixed agent workloadCost / 1k tasks (USD)$2.40$8.10$7.65$4.20$3.95

Methodology, harness scripts and raw runs available under NDA.

What we measure (and why)
Metric
TTFT

Time to first token. Drives perceived responsiveness.

Metric
Throughput

Sustained tokens/sec under concurrency. Drives capacity.

Metric
End-to-end p95

Wall-clock task completion at the 95th percentile. Drives UX.

Metric
Cost / task

Total tokens × prices + tools, attributed per completed task.

Resources

What runs behind the numbers

Want to see your workload benchmarked?

We'll run your representative agent against the major providers and share the raw numbers.