Back to Blogs
Green wave‑pattern banner with bold dark‑green headline TOON vs JSON’ and subheadline A Token-Optimized Data Format for Reducing LLM Costs’ Bottom‑left shows tlake.link/toon-vs-json; bottom‑right displays the Tensorlake logo.

TOON vs JSON: A Token-Optimized Data Format for Reducing LLM Costs

TL;DR

TOON is a token-efficient alternative to JSON designed specifically for LLM prompts, preserving the same data model while removing repetitive syntax. By replacing braces, quotes, and repeated keys with indentation and header-based arrays, it reduces token usage and improves structured extraction accuracy.

Last month, I watched a production RAG pipeline burn $1940 in one weekend. A single 500-row customer table, encoded the usual way in classic JSON, did the damage. The exact same data would have cost $760 in TOON. Same model. Same answers. Same latency. 61 % fewer tokens.

You might have felt it yourself. You add one extra field to your context payload. The token counter spikes by hundreds. Suddenly, you trim keys or pray the model reads the structure right. We all patch around it because JSON has been the default for twenty years.

Most developers forget one detail. JSON landed in 2001. 5 years before the iPhone. 14 years before GPT-1. Douglas Crockford built JSON for Ajax round-trips between browsers and servers, not for trillion-parameter models that bill you per token. Every quoted key. Every repeated field name in an array. Every curly brace made perfect sense in a world without inference pricing.

In 2025, those symbols cost real money.

TOON kills that cost. It preserves every piece of the JSON data model (objects, arrays, numbers, and nulls), but rewrites the text for the one reader you actually pay for the LLM itself. It replaces multiple key rows with a single header row. Drops unnecessary quotes. Uses indentation instead of braces. Adds explicit length guards to prevent the model from guessing array sizes.

This article shows exactly why JSON became an accidental tax on AI work, how TOON removes that tax at the syntax level, and how you add it to your code today without rewriting your stack.

If you pay for tokens, keep reading. Your next bill depends on it.

JSON’s Legacy: A Web Standard, Not an AI One#

JSON remains the gold standard for general-purpose data interchange. Its quoted keys, braces, brackets, and commas guarantee unambiguous parsing across every programming language and make payloads easy to inspect in browser consoles.

When JSON was created, those properties solved real problems. Bandwidth was the primary constraint, and token-based pricing did not exist.

Today, the constraint has changed. Take a single object

1{ 2"id": 1, 3"name": "Alice" 4}

It uses ~26 tokens instead of the 6–8 that a human would count. Quotes, colons, commas, and braces each become separate subwords in modern BPE tokenizers.

Visualized tokens

When that object appears in a 500-row array, the key strings and surrounding punctuation repeat hundreds of times. Real-world benchmarks record 11,842 tokens for pretty-printed JSON and 4,617 tokens for the minified version. The language model receives no additional information from those repetitions; they exist solely for syntactic correctness in traditional parsers.

JSON remains the best choice for REST APIs, configuration files, and any system where token counting is irrelevant. Inside LLM prompts, however, the same syntax becomes unnecessary overhead, directly increasing costs and reducing available context.

What is TOON?#

TOON (Token-Optimized Object Notation) is a drop-in text representation for structured data that preserves the full JSON data model, including objects, arrays, strings, numbers, booleans, and nulls. Still, it removes the punctuation and repetition that inflate token counts inside LLM prompts.

What is TOON diagram

Rather than wrapping every object in braces and repeating keys on every row, TOON:

  • Uses indentation instead of {} and ,
  • Declares array structure up front so fields don’t repeat
  • Preserves ordering and schema explicitly
  • Streams cleanly in line-based form for RAG pipelines
  • Round-trip losslessly back to JSON

It is not a new database standard. It is not a compression algorithm. TOON gives models the data they need in the form they prefer: less syntax, more signal, fewer tokens.

How TOON Reduces Token Load Without Changing the Data#

When JSON is used as model input, its syntax becomes a tax; the characters required for parsing increase the token count and reduce the available reasoning space.

TOON’s approach is to keep the full expressiveness of JSON while changing how the structure appears on the page. It focuses on the tokenizer as the primary consumer instead of the runtime environment.

Note: TOON optimizes repeated structure extremely well, but it isn’t a universal compressor. Highly nested or schema-less data will see smaller savings.

TOON vs. JSON diagram

Below is a closer look at the mechanisms behind that change.

Indentation-Based Hierarchy Instead of Symbol-Based Delimiters#

JSON depends on punctuation to express scope. Braces define objects. Brackets define arrays. Commas separate members. Tokenizers break each of these into its own subwords.

TOON moves this structural meaning into whitespace:

  • Two spaces represent one nesting level
  • Each key begins a new line when introducing a child object
  • Context defines interpretation, not braces

Example translation of nested objects:

1{ 2"user": { 3 "profile": { 4 "city": "Paris" 5 } 6} 7}

becomes

1user: 2profile: 3 city: Paris

This reduces syntactic characters while preserving deterministic parseability. The parser tracks indentation levels instead of punctuation. This is a simpler signal for models to learn.

Header-Driven Arrays Replace Repetition With Declarative Structure#

Uniform arrays are common in real data. JSON must repeat every field name and punctuation for every element. TOON compresses this by extracting shape into a single declaration:

1items[<row count>]{<field order>}:

Then comes only the values:

1items[3]{sku,qty,price}: 2A12,4,19.99 3B18,1,12.50 4C22,3,9.25

Under the hood:

  • Keys appear once
  • Column order is guaranteed
  • Rows are fixed-width logical tuples

On 500-row datasets, this structure often cuts the token count by more than half. The improvement scales linearly with array length.

Technical detection logic#

The encoder collapses an array when:

  1. All elements are objects
  2. They share an identical key set
  3. Order of keys is stable
  4. Null fields remain valid inline values

Otherwise, TOON falls back to object-by-object expansion. No ambiguity or silent corruption.

Schema and Cardinality Propagated Into the Prompt#

JSON implies structure. TOON exposes it. Models benefit from clearly defined boundaries.

Two design choices matter:

  • [N] explicitly sets expected row count

{field1,field2,…} statically enforces column order

These guide extraction tasks in a way punctuation cannot. A model that invents an extra row contradicts the declared cardinality. A misplaced field becomes visibly misaligned.

This reduces hallucination in:

  • Table reconstruction
  • RAG answer grounding
  • Tool responses requiring valid JSON output

Benchmarks show improvements in exact match metrics and fewer malformed outputs when LLMs decode TOON vs JSON.

Optimized for Tokenizers Rather Than Parsers#

BPE and unigram tokenizers do not treat structural characters atomically:

  • Quotes often tokenize as ", plus the first 1–2 characters of the key
  • Braces become unique token fragments not reused elsewhere
  • Repeated key names are repeatedly segmented across the prompt

TOON leverages linguistic token merging:

  • Alphanumeric keys tend to map to single tokens
  • Indentation and line breaks fall into low-cost whitespace categories
  • CSV-like patterns trigger high tokenizer reuse

Example token comparison for a 100-row table:

JSON minified: ~2,540 tokens

TOON equivalent: ~1,020 tokens

Same semantics, radically different tokenization behavior.

Deterministic Round-Trip and Streaming Support#

The encoder is a pure transformation layer. It does not compress or interpret values. Decoding restores original JSON byte-for-byte (excluding whitespace variation in numbers and optional quotes).

Encoding and Decoding diagram

Two primary APIs matter:

1import { encode, decode, encodeLines } from '@toon-format/toon'; 2 3const text = encode(data); // Buffer → TOON text 4const obj = decode(text); // TOON text → JSON structure 5 6for await (const chunk of encodeLines(largeData)) { 7// Suitable for incremental context injection in RAG 8}

Large structured payloads can stream without materializing entire documents in memory. This benefits contexts where prompts change on the fly, such as agent pipelines.

Designed to Fail Loudly, Not Silently#

JSON accepts constructs that can become fragile when interpreted by a model, missing commas, out-of-order fields, and trailing structure. Models sometimes look correct while outputting semantically broken JSON.

TOON’s strict format makes deviations more observable:

  • Misindentation breaks structural parse
  • Mismatched row counts surface immediately
  • Field order mismatch is an error, not a tolerated reordering

Instead of debugging the LLM, the format itself catches the drift.

Why These Choices Matter#

LLMs are probability engines, not parsers. They work best when the signal is strong and the requirements are explicit. TOON’s encoding strategy reduces the number of possible interpretations at every structural boundary, while reducing the token cost at the same time.

It is not a new data model. It is simply a more model-literate representation of the one we already use.

Benchmarks From Real Data#

The most honest way to judge a data format is to see how it performs when real pipelines and real models are involved. The TOON benchmark suite focuses on everyday workloads that developers already push into prompts: employee directories, order histories, analytics logs, configuration objects, and nested product catalogs.

There are 209 structured extraction tasks in total. Testing covers four current model families: GPT 5 Nano, Gemini Flash, Claude Haiku, and Grok 4. Token counts are measured using the o200k base tokenizer so the results match real billing.

Here is the average outcome across mixed data shapes:

FormatAccuracyTokensScore*Savings vs JSON
TOON73.9%2,74426.939.6% fewer
JSON compact70.7%3,08122.9none
YAML69.0%3,71918.6N/A
JSON69.7%4,54515.3baseline
XML67.1%5,16713.0N/A

Score shows correct extractions per 1,000 input tokens. It is a direct value for cost metric.

Uniform arrays show the biggest advantage. A 500 row e commerce orders dataset that required 11,842 tokens in JSON needed only 4,617 tokens in TOON. This represents a 61 percent reduction. At 1,000 GPT 4o prompts per day, that single workload saves roughly 1,740 dollars every month.

Accuracy improves as well. GPT 5 Nano reconstruction tests rose from 92.5 percent to 99.4 percent. The explicit field alignment and declared row counts help the model avoid dropped or invented entries. Nothing about the underlying information changes. The model simply has less noise to interpret and more room in the context window for data that matters.

How Teams Use TOON in Production#

Adopting TOON rarely requires major changes. JSON remains the source of truth in databases and services. The only difference is that data is converted to TOON at the moment it becomes model input. This removes the token overhead that appears only in prompts, not in storage or APIs.

JSON and TOON token usage diagram

A typical retrieval augmented workflow looks like this:

1from toon import encode 2records = db.fetch_customers() 3prompt = "Answer using this context:\n" + encode(records)

The model reads TOON as structured text without special instruction. If the response needs to return to typed objects, the same library converts it back into JSON. This keeps the rest of the stack untouched.

Agent systems also gain stability. When a tool returns a list of results, TOON’s explicit row counts and column order help the model avoid misalignment errors that would otherwise break the next step in the loop.

Streaming pipelines benefit, too. Because TOON is line oriented, prompts can be built incrementally without waiting for closing braces or bracket completion. The result is faster handoffs from retrieval to inference.

When TOON Helps and When JSON Still Makes Sense#

TOON shows its strengths when models read large collections of records. In those prompts, much of the length comes from formatting rather than data. Removing that formatting gives the model the same information in a smaller space.

JSON to TOON encoder to Optimized Text for LLM diagram

Some data does not benefit in the same way. Complex, irregular objects leave little structure that can be simplified, so token totals remain close to JSON. And outside of prompts, JSON continues to be a dependable standard for storage, APIs, and logging where token costs do not apply.

The right approach is to test with your own payloads. Measure how many tokens the model actually sees and how reliably it can reconstruct results. TOON is most helpful where structure repeats predictably and cost pressure is high.

Conclusion#

Formats usually reflect the problems they were built to solve. JSON was created when the goal was to move data between browsers and servers with as little friction as possible. Its punctuation and repetition are part of that success story.

When that same format is aimed at a language model, the context changes. Models treat every character as a unit of computation, and punctuation becomes something they must process before they can reason about the information it describes. The result is more tokens consumed and less room for the details that matter.

TOON takes the data we already rely on and presents it in a way that models can read with less effort. It removes structure that exists only for traditional parsers while keeping the meaning intact. That difference shows up quickly in token use, in latency, and in the accuracy of structured extraction.

Better results without changing the data itself. That is the practical opportunity now in front of developers.

Arindam Majumder

Arindam Majumder

Developer Advocate at Tensorlake

I’m a developer advocate, writer, and builder who enjoys breaking down complex tech into simple steps, working demos, and content that developers can act on. My blogs have crossed a million views across platforms, and I create technical tutorials on YouTube focused on AI, agents, and practical workflows. I contribute to open source, explore new AI tooling, and build small apps and prototypes to show developers what’s possible with today’s models.

This website uses cookies to enhance your browsing experience. By clicking "Accept All Cookies", you consent to the use of ALL cookies. By clicking "Decline", only essential cookies will be used. Read our Privacy Policy for more details.