LLMCloud · Open-model inference

Every open model. One API. Built for production.

Serving the leading open-source LLMs — gpt-oss, Kimi, Qwen, GLM, DeepSeek, Llama. Pay-as-you-go, batch, fine-tune or dedicated.

Get an API key See the quickstart

40+

Open models

Max context

99.99%

Enterprise SLA

−50%

Batch savings

Services

Four ways to run open models in production.

Real-time inference

OpenAI-compatible API. Sub-second latency, 99.9% uptime, pay per token.

Batch inference

Async processing at up to 50% lower cost — evals, embeddings, pipelines.

Fine-tuning

LoRA, SFT, DPO and RL on your data. Bring a dataset, get a model.

Dedicated deployments

Reserved GPUs, private networking, 99.99% SLA.

Models

A catalog that ships with the frontier.

New open releases evaluated, optimized and added within days. Same API.

gpt-oss-120B

General · Reasoning

128K context

Kimi-K2.5

Long context · Agents

2M context

Qwen3-Coder-480B

Code · MoE

256K context

GLM-5

Multilingual · Reasoning

128K context

DeepSeek V3.2

Reasoning · MoE

128K context

MiniMax M2.1

Multimodal · Agents

1M context

Llama 4 405B

General purpose

256K context

Mistral Large 3

European · Tooling

128K context

Nemotron 3 Super

Enterprise · Reasoning

128K context

Quickstart

OpenAI-compatible. Drop-in in 3 lines.

Pythonapi.llmcloud.ai

from openai import OpenAI

client = OpenAI(
 base_url="https://api.llmcloud.ai/v1",
 api_key="lc_live_…",
)

resp = client.chat.completions.create(
 model="gpt-oss-120b",
 messages=[{"role": "user", "content": "Summarize this report."}],
)
print(resp.choices[0].message.content)

cURLapi.llmcloud.ai

curl https://api.llmcloud.ai/v1/chat/completions \
 -H "Authorization: Bearer $LC_API_KEY" \
 -H "Content-Type: application/json" \
 -d '{
 "model": "deepseek-v3.2",
 "messages": [{"role":"user","content":"Hello"}]
 }'

Pricing

Pay only for the tokens you use.

Free

$0to start

Free API key. Pay only for tokens consumed.

OpenAI-compatible API
All open models
Batch + real-time
Community support

Get an API key

Scale

$500/ month min

Higher rate limits and analytics for production.

10× rate limits
Priority queue
Usage analytics
24h email SLA

Start scaling

Enterprise

Customannual

Dedicated capacity, VPC peering, fine-tuning included.

Reserved GPUs
VPC / BYOC
Fine-tuning included
99.99% SLA · 24/7

Talk to sales

Token prices vary by model. Batch inference up to 50% lower than real-time.