AI Streaming: Patterns That Matter in Production

Streaming feels great when it’s smooth. It feels broken when it janks, duplicates text, or can’t cancel. These are the few patterns I reach for when building streaming AI features with the Vercel AI SDK.

1) The core loop: async iteration

Start here. Keep it boring.

import { streamText } from "ai"
import { openai } from "@ai-sdk/openai"

const result = streamText({
  model: openai("gpt-4-turbo"),
  prompt: "Explain recursion simply.",
})

for await (const chunk of result.textStream) {
  process.stdout.write(chunk)
}

This works because the consumer pulls chunks at its own pace.

2) Cancellation: treat it as a first-class feature

If users can’t stop a response, streaming isn’t “nice to have” — it’s frustrating.

const controller = new AbortController()

await fetch("/api/chat", {
  method: "POST",
  body: JSON.stringify({ prompt }),
  signal: controller.signal,
})

// call this on “Stop”
controller.abort()

Rule: wire cancel on day one. Everything else can iterate.

3) Buffering: a small queue solves most edge cases

You only need a queue when you want to decouple producer and consumer (batching, scheduling, multiple consumers).

class TokenBuffer {
  private queue: string[] = []
  private waiters: Array<(value: string) => void> = []

  push(token: string) {
    const waiter = this.waiters.shift()
    if (waiter) waiter(token)
    else this.queue.push(token)
  }

  async pull(): Promise<string> {
    const token = this.queue.shift()
    if (token) return token
    return new Promise((resolve) => this.waiters.push(resolve))
  }
}

This is the producer/consumer pattern you’ll reuse everywhere.

4) Transforms: modify the stream without rewriting your pipeline

If you need redaction, formatting, or annotations, transforms are clean and composable.

const uppercaseStream = result.textStream.pipeThrough(
  new TransformStream<string, string>({
    transform(chunk, controller) {
      controller.enqueue(chunk.toUpperCase())
    },
  })
)

for await (const chunk of uppercaseStream) {
  console.log(chunk)
}

5) Accumulation: keep the full text only when you need it

Streaming UI usually only needs append-only rendering. Accumulate only when you need the full response for parsing, logging, or post-processing.

let full = ""

for await (const chunk of result.textStream) {
  full += chunk
  // stream chunk to UI
}

// full response available here

The takeaway

Most streaming issues are not “AI problems”. They are UI and state problems:

stream append-only

support cancellation

buffer only when you must

transform in a pipeline

accumulate only when needed