Available Models
Understand the characteristics of each model and when to use one over the other.
Model comparison
| Model | Backed by | Role | Latency | Context window |
|---|---|---|---|---|
gabforge-coder
|
Qwen2.5-Coder-32B (local) | Primary — always tried first | Low (~300–800 ms to first token) | 128 k tokens |
claude-sonnet-4-6
|
Anthropic Claude (cloud) | Automatic fallback under load | Low–medium (cloud-dependent) | 200 k tokens |
gabforge-coder
gabforge-coder is powered by Qwen2.5-Coder-32B running on GabForge's own GPU infrastructure. It is specifically fine-tuned for code generation, debugging, and technical explanation across all major programming languages.
Best for
- Code generation and completion
- Debugging and code review
- Technical documentation and explanation
- High-volume, latency-sensitive applications
Cloud fallback (claude-sonnet-4-6)
When the local gabforge-coder model is under heavy load or temporarily unavailable, requests are automatically routed to claude-sonnet-4-6. This is fully transparent — the same API call returns the same response shape regardless of which backend served it.
model field in the response body.
Checking which model served your request
response = client.chat.completions.create(
model="gabforge-coder",
messages=[{"role": "user", "content": "Hello!"}],
)
# Will be "gabforge-coder" or "claude-sonnet-4-6" depending on which backend served it
print(response.model)
Listing available models
You can always query the /v1/models endpoint to get the current list of available models.
models = client.models.list() for model in models.data: print(model.id)