TokenFactory and the open-model frontier: gpt-oss, Kimi-K2.5, Qwen3-Coder, GLM-5 in production

Twelve months ago, the open-source frontier was clearly a step behind. Today, gpt-oss-120B holds its own on reasoning, Kimi-K2.5 stretches context to 2M tokens, Qwen3-Coder dominates code, GLM-5 leads multilingual, and DeepSeek V3.2 set a new bar on cost-per-quality. The open frontier is no longer the catch-up frontier.

What changed

Architectures matured. Mixture-of-experts, sparse attention and speculative decoding moved from papers to default choices.
Post-training is open. RLHF, DPO, RLAIF and on-policy distillation are no longer secret sauce.
Synthetic data scaled. The data moat shrank.
Operational tooling caught up. vLLM, TensorRT-LLM and SGLang made open-model serving genuinely production-grade.

Why operations is the remaining moat

Open weights aren't a service. Running them at production speed, keeping the catalog current with releases, handling fine-tuning safely, providing OpenAI-compatible APIs and meeting enterprise SLAs — that's where most teams stall. TokenFactory exists to close that gap.

What we ship

1.Inference service — OpenAI-compatible API across the entire open frontier, sub-second latency, 99.9% uptime.
2.Batch inference — async at up to 50% lower cost. Built for evals, embeddings, document pipelines.
3.Post-training — SFT, LoRA, DPO and RL on your data, no GPU management.
4.Dedicated deployments — reserved capacity with 99.99% SLA, private networking, custom regions.

"We swapped four vendors and a homegrown serving stack for one TokenFactory endpoint. Latency improved, costs dropped, and our team got back to building."
— Engineering lead, fintech scaleup

What to use when

Reasoning-heavy: gpt-oss-120B or DeepSeek V3.2. Long-context retrieval: Kimi-K2.5. Code generation and review: Qwen3-Coder. Multilingual customer-facing: GLM-5. Agent loops: MiniMax M2.1 or Nemotron 3 Super for tool use. Pick by workload, route by policy, and let TokenFactory handle the rest.

TokenFactory and the open-model frontier: gpt-oss, Kimi-K2.5, Qwen3-Coder, GLM-5 in production

What changed

Why operations is the remaining moat

What we ship

What to use when

More from Product

Sovereign agents: deploying the full Agent OS in an air-gapped data center

Bring this to your enterprise.