Skip to main content
Docs Guides Streaming Responses

Streaming Responses

Receive tokens as they are generated using server-sent events (SSE).

Why stream?

Without streaming, the API waits until the entire response is generated before returning it. For long outputs this can feel slow. With streaming, tokens arrive as soon as they are generated — typically within milliseconds of the first token — which makes the application feel responsive and allows you to display a typing indicator effect.

Python

Pass stream=True to chat.completions.create and iterate over the returned object.

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://ai.gabforge.ai/v1",
    api_key=os.environ["GABFORGE_API_KEY"],
)

stream = client.chat.completions.create(
    model="gabforge-coder",
    messages=[{"role": "user", "content": "Explain the GIL in CPython."}],
    stream=True,
)

# Each chunk contains a delta with the new token(s)
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content is not None:
        print(delta.content, end="", flush=True)

print()

JavaScript (Node.js)

The for await...of loop processes each chunk as it arrives.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ai.gabforge.ai/v1",
  apiKey: process.env.GABFORGE_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "gabforge-coder",
  messages: [{ role: "user", content: "Explain the GIL in CPython." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(content);
}
console.log();

Raw SSE format

If you are not using an SDK, each chunk in the HTTP response body is a data: line containing JSON. The stream terminates with data: [DONE].

data: {"choices":[{"delta":{"content":"The"},"index":0}]}

data: {"choices":[{"delta":{"content":" GIL"},"index":0}]}

data: {"choices":[{"delta":{"content":" is"},"index":0}]}

data: [DONE]

Tips

  • Token usage is not included in streaming chunks — count tokens client-side or use a separate non-streaming call for billing estimates.
  • For web apps, forward the stream from your server to the browser via a chunked HTTP response — do not buffer it server-side.
  • Always guard against delta.content being None — the first and last chunks often contain only role/finish_reason metadata.