Insights30 de mayo de 20266 min read

Streaming Claude into KaTeX: the 80-character batching trick.

Why naive token-by-token streaming breaks LaTeX rendering — and the small batching pattern from Mathmatika that makes streaming math feel native.

A streamed math explanation mid-flight — a partial sentence ending mid-LaTeX expression, the formula half-rendered, with a small batching counter ticking 0/80 in the corner.

If you stream Claude tokens directly into a KaTeX renderer, the output flickers and re-renders every chunk. Math becomes unreadable. Here's the small batching pattern from Mathmatika that fixes it — and why the lesson generalizes to every content type that needs the full string to render correctly.

Mathmatika is our AI math and DSA learning platform. Students upload PDFs, ask questions, and get back streaming explanations with KaTeX-rendered formulas. "Streaming" here is the part everyone underestimates.

Token streaming is wonderful for English prose. Each token arrives, the UI appends it, the user perceives the response as fast even though the underlying model is taking its time. This is the default streaming UX of every modern AI chat product.

Now try it with math. The first token of a formula is `\`. The KaTeX renderer doesn't know that's a `\frac` yet. The next token is `f`. Still doesn't know. The next is `r`. Then `a`. Then `c`. Then `{`. Then `1`. Then `}`. Then `{`. Then `2`. Then `}`. At every step, KaTeX is being asked to render an incomplete LaTeX expression. It throws. The UI shows red error boxes. The user sees broken math.

The naive fix doesn't work

The first instinct is to detect when you're inside a LaTeX expression and pause rendering until the closing brace. This is fragile in practice. LaTeX expressions can be deeply nested. Detection requires real parsing, which means writing a tiny LaTeX tokenizer inside your render loop, which means maintaining state across stream chunks.

It can be done. We tried. The code is unfun to debug and gets worse the moment Claude emits unusual constructs like `\begin{cases}` or aligned multi-line equations.

The 80-character batching trick

What we actually shipped: instead of rendering every incoming token, the streaming layer batches incoming characters in an 80-character buffer. The renderer fires only when the buffer hits 80 characters, or when the stream completes. The buffer flushes after every render.

Eighty characters is large enough that any inline LaTeX expression typically fits within a single batch. The renderer sees a coherent chunk — usually a full expression or a partial expression that ends at a safe boundary — and renders it cleanly. The user perceives smooth, slightly-chunkier streaming rather than choppy character-by-character output. And nothing throws.

Eighty was an empirical choice. We tried 40 (too small, still partial-expression renders). 200 (works, but the streaming feels less alive). 80 is the sweet spot in our content — long enough to contain most inline LaTeX, short enough to feel responsive.

A diagram of the streaming pipeline — Claude tokens entering a buffer counter, flushing every 80 characters to the KaTeX renderer, the rendered math appearing in chunks rather than character-by-character.
Buffer fills to 80, flushes, renders, repeats. Math arrives in coherent units.

The lesson generalizes

Streaming UX assumes the rendered content type is append-friendly. Prose is. Markdown is, mostly. Anything that requires the renderer to see a coherent expression — LaTeX, JSX, SVG, syntax-highlighted code with multi-line constructs — is not.

For any of those content types, you want a batching layer between the stream and the renderer. The batching can be character-count-based (our 80-char), expression-boundary-based (parse for safe split points), or duration-based (flush every 50ms). Each has trade-offs. The character-count approach is the cheapest and works surprisingly well in practice.

The principle: streaming UX is a renderer property, not a transport property. The stream is what arrives at the boundary. What the renderer wants is what the renderer wants, and you compose between them.

Streaming UX is a renderer property, not a transport property. The stream is what arrives; what the renderer wants is what the renderer wants.

On streaming any non-prose content

What we'd start with on day one

If you're shipping streaming AI into anything that isn't pure prose, add a buffer. 80 characters is a great default for math; pick a different number for your content type by trial. Flush on buffer-full or stream-complete. Re-render the whole rendered section on each flush; trying to render incrementally inside the renderer is a separate, harder problem.

And — if you find yourself writing a tokenizer inside your streaming layer, stop. The batching trick is the cheap answer.

Stream into the renderer the renderer wants, not the renderer you wish you had.

Mathmatika is in advanced prototype. The streaming-with-formula-rendering pattern is portable to any LLM-powered learning, scientific, or technical-writing product.