TokenFactory

Leading open-source models, served at record speed.

One API for every open model — running on the same agent-native inference cloud that powers Synaptix. Best-in-class latency and throughput, pay-as-you-go.

40+
Open & frontier models
2M
Max context window
99.99%
Enterprise SLA
−50%
Batch inference savings
TokenFactory services

Four ways to run open models in production.

From a single API call to dedicated multi-region deployments — TokenFactory meets you wherever you are.

Inference service

Access and run powerful open-source AI models through a single OpenAI-compatible API. Sub-second latency, 99.9% uptime, pay per token.

Batch inference

Process millions of requests asynchronously at up to 50% lower cost. Ideal for evaluations, embeddings, document pipelines and offline workflows.

Post-training service

Fine-tune, distill and align open models on your proprietary data. LoRA, full SFT, DPO and RL — without managing GPUs.

Enterprise-grade inference

Deploy and scale models on dedicated infrastructure with guaranteed uptime, private networking and custom SLAs.

Available models

A catalog that ships with the frontier.

New open releases are evaluated, optimized and added within days — same API, no migration.

gpt-oss-120B
General · Reasoning
128K context
Kimi-K2.5
Long context · Agents
2M context
Qwen3-Coder-480B-A35B-Instruct
Code · MoE
256K context
GLM-5
Multilingual · Reasoning
128K context
DeepSeek V3.2
Reasoning · MoE
128K context
MiniMax M2.1
Multimodal · Agents
1M context
Nemotron 3 Super
Enterprise · Reasoning
128K context
Llama 4 405B
General purpose
256K context
Mistral Large 3
European · Tooling
128K context
…and more added every month.
Quickstart

OpenAI-compatible. Drop-in in 3 lines.

Pythonapi.tokenfactory.ai
from openai import OpenAI

client = OpenAI(
  base_url="https://api.tokenfactory.ai/v1",
  api_key="tf_live_…",
)

resp = client.chat.completions.create(
  model="gpt-oss-120b",
  messages=[{"role": "user", "content": "Summarize this report."}],
)
print(resp.choices[0].message.content)
cURLapi.tokenfactory.ai
curl https://api.tokenfactory.ai/v1/chat/completions \
  -H "Authorization: Bearer $TF_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v3.2",
    "messages": [{"role":"user","content":"Hello"}]
  }'
Pricing

Pay only for the tokens you use.

Pay-as-you-go
$0minimum

Start with a free API key. Pay only for tokens consumed.

  • OpenAI-compatible API
  • All open models
  • Batch + real-time
  • Community support
Get an API key
Scale
$500/ month minimum

Higher rate limits, priority routing and analytics for production workloads.

  • 10× higher rate limits
  • Priority queue
  • Usage analytics & budgets
  • Email support · 24h SLA
Start scaling
Enterprise
Customannual

Dedicated capacity, private networking and a 99.99% uptime SLA.

  • Dedicated GPUs · reserved
  • VPC peering · BYOC
  • Fine-tuning included
  • 99.99% SLA · 24/7 support
Talk to sales

Token prices vary by model. Batch inference is up to 50% lower than real-time.

Resources

Go deeper on TokenFactory

Benchmarks, technical posts and a printable product brief.

Start shipping with open models today.

Spin up an API key in minutes, or talk to us about dedicated capacity and fine-tuning.