Basic Streaming
ModelResponse object. During streaming, most chunks contain a small piece of the content. The final chunk includes usage metrics.
Streaming with Tools
When the model decides to call a tool during streaming, you’ll receive chunks withtool_calls instead of content:
Streaming with Reasoning
Models that support reasoning (like DeepSeek Reasoner or OpenAI o1) emit reasoning content before the final answer:Collecting the Full Response
To stream output to the user while also capturing the complete response:Streaming vs Non-Streaming
invoke() / ainvoke() | invoke_stream() / ainvoke_stream() | |
|---|---|---|
| Latency | Waits for full response | First token arrives immediately |
| Return type | Single ModelResponse | Iterator of ModelResponse chunks |
| Usage metrics | Available on response | Available on final chunk |
| Best for | Background processing, short responses | User-facing output, long responses |