Skip to main content
Docs Guides Available Models

Available Models

Understand the characteristics of each model and when to use one over the other.

Model comparison

Model Backed by Role Latency Context window
gabforge-coder Qwen2.5-Coder-32B (local) Primary — always tried first Low (~300–800 ms to first token) 128 k tokens
claude-sonnet-4-6 Anthropic Claude (cloud) Automatic fallback under load Low–medium (cloud-dependent) 200 k tokens

gabforge-coder

gabforge-coder is powered by Qwen2.5-Coder-32B running on GabForge's own GPU infrastructure. It is specifically fine-tuned for code generation, debugging, and technical explanation across all major programming languages.

Best for

  • Code generation and completion
  • Debugging and code review
  • Technical documentation and explanation
  • High-volume, latency-sensitive applications

Cloud fallback (claude-sonnet-4-6)

When the local gabforge-coder model is under heavy load or temporarily unavailable, requests are automatically routed to claude-sonnet-4-6. This is fully transparent — the same API call returns the same response shape regardless of which backend served it.

No action required. Fallback routing is automatic. You can check which model actually served a request from the model field in the response body.

Checking which model served your request

response = client.chat.completions.create(
    model="gabforge-coder",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Will be "gabforge-coder" or "claude-sonnet-4-6" depending on which backend served it
print(response.model)

Listing available models

You can always query the /v1/models endpoint to get the current list of available models.

models = client.models.list()
for model in models.data:
    print(model.id)