EngineeringApr 5, 20266 min read

How We Built Real-Time AI Streaming with Vercel AI SDK

Marcus Rivera

Senior Engineer

One of the most important aspects of any AI application is perceived speed. Users don't want to wait 10 seconds staring at a loading spinner. They want to see the AI thinking in real time.

The Challenge

Traditional request-response patterns don't work well for LLM outputs. A typical GPT-4 response might take 8–15 seconds to fully generate. Making users wait that long feels broken.

Our Solution: Streaming with Vercel AI SDK

We use the Vercel AI SDK's streamText function to pipe tokens directly to the client as they're generated:

typescript

import { streamText } from "ai";

export async function POST(req: Request) { const { messages } = await req.json();

const result = streamText({ model: openai("gpt-4o"), messages, });

return result.toDataStreamResponse(); } `

On the client, we use the useChat hook which handles the streaming protocol automatically:

typescript

const { messages, input, handleSubmit } = useChat({
  api: "/api/chat",
});

Results

Time to first token: ~200ms (down from 8s full response wait)
Perceived latency: Near-instant for users
Server costs: Same — streaming doesn't increase compute, just changes delivery

Key Takeaways

Streaming isn't just a nice-to-have — it fundamentally changes how users perceive your AI product. The same model, same quality, same cost — but the experience feels 10x faster.

Try NexusAI Free

Start creating with all 8 AI tools today. No credit card required.

Get Started Free