Product· For Developer / Eng leader

TokenFactory and the open-model frontier: gpt-oss, Kimi-K2.5, Qwen3-Coder, GLM-5 in production

Open models have caught up. The remaining gap is operational: who serves them fast, who keeps them current, who handles fine-tuning. TokenFactory is our answer.

Ravi Sharma · Head of Open Models, Synaptix March 25, 2026 6 min

Twelve months ago, the open-source frontier was clearly a step behind. Today, gpt-oss-120B holds its own on reasoning, Kimi-K2.5 stretches context to 2M tokens, Qwen3-Coder dominates code, GLM-5 leads multilingual, and DeepSeek V3.2 set a new bar on cost-per-quality. The open frontier is no longer the catch-up frontier.

What changed

  • Architectures matured. Mixture-of-experts, sparse attention and speculative decoding moved from papers to default choices.
  • Post-training is open. RLHF, DPO, RLAIF and on-policy distillation are no longer secret sauce.
  • Synthetic data scaled. The data moat shrank.
  • Operational tooling caught up. vLLM, TensorRT-LLM and SGLang made open-model serving genuinely production-grade.

Why operations is the remaining moat

Open weights aren't a service. Running them at production speed, keeping the catalog current with releases, handling fine-tuning safely, providing OpenAI-compatible APIs and meeting enterprise SLAs — that's where most teams stall. TokenFactory exists to close that gap.

What we ship

  1. 1.Inference service — OpenAI-compatible API across the entire open frontier, sub-second latency, 99.9% uptime.
  2. 2.Batch inference — async at up to 50% lower cost. Built for evals, embeddings, document pipelines.
  3. 3.Post-training — SFT, LoRA, DPO and RL on your data, no GPU management.
  4. 4.Dedicated deployments — reserved capacity with 99.99% SLA, private networking, custom regions.

"We swapped four vendors and a homegrown serving stack for one TokenFactory endpoint. Latency improved, costs dropped, and our team got back to building."

Engineering lead, fintech scaleup

What to use when

Reasoning-heavy: gpt-oss-120B or DeepSeek V3.2. Long-context retrieval: Kimi-K2.5. Code generation and review: Qwen3-Coder. Multilingual customer-facing: GLM-5. Agent loops: MiniMax M2.1 or Nemotron 3 Super for tool use. Pick by workload, route by policy, and let TokenFactory handle the rest.

Related reading

More from Product

Bring this to your enterprise.

Talk to our team about how Synaptix would map to your stack and your roadmap.