Docs Guides Available Models

Available Models

Understand the characteristics of each model and when to use one over the other.

Model comparison

Model	Backed by	Role	Latency	Context window
`gabforge-coder`	Qwen2.5-Coder-32B (local)	Primary — always tried first	Low (~300–800 ms to first token)	128 k tokens
`claude-sonnet-4-6`	Anthropic Claude (cloud)	Automatic fallback under load	Low–medium (cloud-dependent)	200 k tokens

gabforge-coder

gabforge-coder is powered by Qwen2.5-Coder-32B running on GabForge's own GPU infrastructure. It is specifically fine-tuned for code generation, debugging, and technical explanation across all major programming languages.

Best for

Code generation and completion
Debugging and code review
Technical documentation and explanation
High-volume, latency-sensitive applications

Cloud fallback (claude-sonnet-4-6)

When the local gabforge-coder model is under heavy load or temporarily unavailable, requests are automatically routed to claude-sonnet-4-6. This is fully transparent — the same API call returns the same response shape regardless of which backend served it.

No action required. Fallback routing is automatic. You can check which model actually served a request from the model field in the response body.

Checking which model served your request

response = client.chat.completions.create(
    model="gabforge-coder",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Will be "gabforge-coder" or "claude-sonnet-4-6" depending on which backend served it
print(response.model)

Listing available models

You can always query the /v1/models endpoint to get the current list of available models.

models = client.models.list()
for model in models.data:
    print(model.id)

All Guides API Reference