AI Streaming: Patterns That Matter in Production
Streaming feels great when it’s smooth. It feels broken when it janks, duplicates text, or can’t cancel. These are the few patterns I reach for when building streaming AI features with the Vercel AI SDK.
1) The core loop: async iteration
Start here. Keep it boring.
import { streamText } from "ai"
import { openai } from "@ai-sdk/openai"
const result = streamText({
model: openai("gpt-4-turbo"),
prompt: "Explain recursion simply.",
})
for await (const chunk of result.textStream) {
process.stdout.write(chunk)
}
This works because the consumer pulls chunks at its own pace.
2) Cancellation: treat it as a first-class feature
If users can’t stop a response, streaming isn’t “nice to have” — it’s frustrating.
const controller = new AbortController()
await fetch("/api/chat", {
method: "POST",
body: JSON.stringify({ prompt }),
signal: controller.signal,
})
// call this on “Stop”
controller.abort()
Rule: wire cancel on day one. Everything else can iterate.
3) Buffering: a small queue solves most edge cases
You only need a queue when you want to decouple producer and consumer (batching, scheduling, multiple consumers).
class TokenBuffer {
private queue: string[] = []
private waiters: Array<(value: string) => void> = []
push(token: string) {
const waiter = this.waiters.shift()
if (waiter) waiter(token)
else this.queue.push(token)
}
async pull(): Promise<string> {
const token = this.queue.shift()
if (token) return token
return new Promise((resolve) => this.waiters.push(resolve))
}
}
This is the producer/consumer pattern you’ll reuse everywhere.
4) Transforms: modify the stream without rewriting your pipeline
If you need redaction, formatting, or annotations, transforms are clean and composable.
const uppercaseStream = result.textStream.pipeThrough(
new TransformStream<string, string>({
transform(chunk, controller) {
controller.enqueue(chunk.toUpperCase())
},
})
)
for await (const chunk of uppercaseStream) {
console.log(chunk)
}
5) Accumulation: keep the full text only when you need it
Streaming UI usually only needs append-only rendering. Accumulate only when you need the full response for parsing, logging, or post-processing.
let full = ""
for await (const chunk of result.textStream) {
full += chunk
// stream chunk to UI
}
// full response available here
The takeaway
Most streaming issues are not “AI problems”. They are UI and state problems: