Benchmarks · Independent
The end-to-end latency benchmark for agents.
Most inference benchmarks measure single calls. We measure what predicts agent UX — TTFT, sustained throughput and e2e task latency under concurrency.
3–5×
Lower TTFT vs. major providers
3.0×
Faster e2e p95 on agent tasks
↓ 60%
Cost per 1k completed tasks
2026 Q1
Last refresh — re-run quarterly
Headline results
Synaptix vs. major providers.
Concurrency 16, provider-default hardware. "—" = model unavailable. Medians across 100 runs from us-east. Reproduce with our harness.
| Model | Metric | Synaptix | Bedrock | Vertex | Together | Fireworks |
|---|---|---|---|---|---|---|
| gpt-oss-120B | TTFT (ms) | 182 | — | — | 318 | 291 |
| gpt-oss-120B | Tokens/sec (single) | 412 | — | — | 248 | 276 |
| Llama 4 405B | TTFT (ms) | 240 | 612 | 598 | 445 | 402 |
| Llama 4 405B | p95 e2e task (s) | 3.1 | 9.4 | 8.8 | 5.7 | 5.2 |
| Kimi-K2.5 (long ctx 1M) | TTFT (ms) | 388 | — | — | 972 | — |
| Qwen3-Coder | Tokens/sec (single) | 528 | — | — | 342 | 381 |
| Mixed agent workload | Cost / 1k tasks (USD) | $2.40 | $8.10 | $7.65 | $4.20 | $3.95 |
Methodology, harness scripts and raw runs available under NDA.
What we measure (and why)
Metric
TTFT
Time to first token. Drives perceived responsiveness.
Metric
Throughput
Sustained tokens/sec under concurrency. Drives capacity.
Metric
End-to-end p95
Wall-clock task completion at the 95th percentile. Drives UX.
Metric
Cost / task
Total tokens × prices + tools, attributed per completed task.
Resources
What runs behind the numbers
Want your workload benchmarked?
We'll run your agent against the major providers and share raw numbers.