Inference Platform

Dedicated endpoints for every agent — in minutes.

The inference backbone behind Synaptix Agent Platform. Open-source LLMs as dedicated, agent-optimized endpoints with guaranteed latency, private networking and per-workflow routing.

< 2 min
To provision a dedicated endpoint
40+
Open & frontier models
99.99%
Enterprise SLA
−50%
Batch inference savings
Inference Platform services

Four ways to run inference for production agents.

From a shared API call to reserved multi-region endpoints — scales with your agent estate.

Real-time inference

Sub-second latency for interactive agents.

Batch inference

Async processing at up to 50% lower cost.

Post-training & fine-tuning

Fine-tune, distill and align open models on your data. LoRA, SFT, DPO and RL.

Dedicated agent endpoints

Reserved capacity in minutes.

Fast deployment

Dedicated agent endpoints — in minutes, not weeks.

Pick a model, region and latency target from the console — the endpoint goes live and agents start routing traffic immediately. Private networking, custom SLAs and full audit from minute one.

Sub-2-minute provision
Choose model, region and SLA. Live and routing immediately.
Agent-optimized routing
Per-workflow model selection — reasoning, code, long-context or multimodal.
Private & sovereign
VPC peering, BYOC, air-gapped on-prem. Nothing leaves your perimeter.
Elastic scaling
Autoscale from zero to thousands. Reserved capacity for predictable peaks.
Available models

A catalog that ships with the frontier.

New open releases added within days — same API, no migration.

gpt-oss-120B
General · Reasoning
128K context
Kimi-K2.5
Long context · Agents
2M context
Qwen3-Coder-480B-A35B-Instruct
Code · MoE
256K context
GLM-5
Multilingual · Reasoning
128K context
DeepSeek V3.2
Reasoning · MoE
128K context
MiniMax M2.1
Multimodal · Agents
1M context
Nemotron 3 Super
Enterprise · Reasoning
128K context
Llama 4 405B
General purpose
256K context
Mistral Large 3
European · Tooling
128K context
…and more added every month.
Quickstart

OpenAI-compatible. Drop-in in 3 lines.

Pythonapi.synaptix.ai
from openai import OpenAI

client = OpenAI(
 base_url="https://api.synaptix.ai/v1",
 api_key="sx_live_…",
)

resp = client.chat.completions.create(
 model="gpt-oss-120b",
 messages=[{"role": "user", "content": "Summarize this report."}],
)
print(resp.choices[0].message.content)
cURLapi.synaptix.ai
curl https://api.synaptix.ai/v1/chat/completions \
 -H "Authorization: Bearer $SX_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "deepseek-v3.2",
 "messages": [{"role":"user","content":"Hello"}]
 }'
Pricing

Pay only for the tokens you use.

Pay-as-you-go
$0minimum

Free API key. Pay only for tokens consumed.

  • OpenAI-compatible API
  • All open models
  • Batch + real-time
  • Community support
Get an API key
Scale
$500/ month minimum

Higher limits, priority routing and analytics for production.

  • 10× higher rate limits
  • Priority queue
  • Usage analytics & budgets
  • Email support · 24h SLA
Start scaling
Enterprise
Customannual

Dedicated capacity, private networking, 99.99% SLA.

  • Dedicated GPUs · reserved
  • VPC peering · BYOC
  • Fine-tuning included
  • 99.99% SLA · 24/7 support
Talk to sales

Token prices vary by model. Batch inference is up to 50% lower than real-time.

Resources

Go deeper on the Inference Platform

Benchmarks, technical posts and a printable product brief.

Ship with dedicated agent endpoints today.

Spin up an API key in minutes — or talk to us about reserved capacity and fine-tuning.