Streaming Responses
Receive tokens as they are generated using server-sent events (SSE).
Why stream?
Without streaming, the API waits until the entire response is generated before returning it. For long outputs this can feel slow. With streaming, tokens arrive as soon as they are generated — typically within milliseconds of the first token — which makes the application feel responsive and allows you to display a typing indicator effect.
Python
Pass stream=True to chat.completions.create and iterate over the returned object.
import os from openai import OpenAI client = OpenAI( base_url="https://ai.gabforge.ai/v1", api_key=os.environ["GABFORGE_API_KEY"], ) stream = client.chat.completions.create( model="gabforge-coder", messages=[{"role": "user", "content": "Explain the GIL in CPython."}], stream=True, ) # Each chunk contains a delta with the new token(s) for chunk in stream: delta = chunk.choices[0].delta if delta.content is not None: print(delta.content, end="", flush=True) print()
JavaScript (Node.js)
The for await...of loop processes each chunk as it arrives.
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://ai.gabforge.ai/v1", apiKey: process.env.GABFORGE_API_KEY, }); const stream = await client.chat.completions.create({ model: "gabforge-coder", messages: [{ role: "user", content: "Explain the GIL in CPython." }], stream: true, }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content ?? ""; process.stdout.write(content); } console.log();
Raw SSE format
If you are not using an SDK, each chunk in the HTTP response body is a data: line containing JSON. The stream terminates with data: [DONE].
data: {"choices":[{"delta":{"content":"The"},"index":0}]}
data: {"choices":[{"delta":{"content":" GIL"},"index":0}]}
data: {"choices":[{"delta":{"content":" is"},"index":0}]}
data: [DONE]
Tips
- Token usage is not included in streaming chunks — count tokens client-side or use a separate non-streaming call for billing estimates.
- For web apps, forward the stream from your server to the browser via a chunked HTTP response — do not buffer it server-side.
- Always guard against
delta.contentbeingNone— the first and last chunks often contain only role/finish_reason metadata.