The end-to-end latency benchmark for agents.
Most published inference benchmarks measure single-call performance. We measure what actually predicts agent UX: TTFT, sustained throughput, and end-to-end task latency under concurrency.
Synaptix vs. major providers.
Concurrency: 16. Hardware: provider default. Models: where unavailable on a provider, the cell shows "—". All numbers are medians across 100 runs from us-east. Reproduce with the harness in our methodology pack.
| Model | Metric | Synaptix | Bedrock | Vertex | Together | Fireworks |
|---|---|---|---|---|---|---|
| gpt-oss-120B | TTFT (ms) | 182 | — | — | 318 | 291 |
| gpt-oss-120B | Tokens/sec (single) | 412 | — | — | 248 | 276 |
| Llama 4 405B | TTFT (ms) | 240 | 612 | 598 | 445 | 402 |
| Llama 4 405B | p95 e2e task (s) | 3.1 | 9.4 | 8.8 | 5.7 | 5.2 |
| Kimi-K2.5 (long ctx 1M) | TTFT (ms) | 388 | — | — | 972 | — |
| Qwen3-Coder | Tokens/sec (single) | 528 | — | — | 342 | 381 |
| Mixed agent workload | Cost / 1k tasks (USD) | $2.40 | $8.10 | $7.65 | $4.20 | $3.95 |
Methodology, harness scripts and raw runs available under NDA.
Time to first token. Drives perceived responsiveness.
Sustained tokens/sec under concurrency. Drives capacity.
Wall-clock task completion at the 95th percentile. Drives UX.
Total tokens × prices + tools, attributed per completed task.
What runs behind the numbers
Want to see your workload benchmarked?
We'll run your representative agent against the major providers and share the raw numbers.