How We Built Real-Time AI Streaming with Vercel AI SDK
Marcus Rivera
Senior Engineer
One of the most important aspects of any AI application is perceived speed. Users don't want to wait 10 seconds staring at a loading spinner. They want to see the AI thinking in real time.
The Challenge
Traditional request-response patterns don't work well for LLM outputs. A typical GPT-4 response might take 8–15 seconds to fully generate. Making users wait that long feels broken.
Our Solution: Streaming with Vercel AI SDK
We use the Vercel AI SDK's streamText function to pipe tokens directly to the client as they're generated:
import { streamText } from "ai";export async function POST(req: Request) { const { messages } = await req.json();
const result = streamText({ model: openai("gpt-4o"), messages, });
return result.toDataStreamResponse();
}
`
On the client, we use the useChat hook which handles the streaming protocol automatically:
const { messages, input, handleSubmit } = useChat({
api: "/api/chat",
});Results
- Time to first token: ~200ms (down from 8s full response wait)
- Perceived latency: Near-instant for users
- Server costs: Same — streaming doesn't increase compute, just changes delivery
Key Takeaways
Streaming isn't just a nice-to-have — it fundamentally changes how users perceive your AI product. The same model, same quality, same cost — but the experience feels 10x faster.