TOON: Token-Oriented Object Notation for Efficient LLM Data Exchange

TOON (Token-Oriented Object Notation) is a new, token‑efficient data format designed specifically for exchanging structured data with LLMs, often cutting prompt size by roughly 30–60 percent compared to equivalent JSON on large, uniform datasets.

Where JSON was built for web APIs and YAML for human‑authored configuration, TOON sits in a different niche: it combines YAML‑style indentation with CSV‑style tabular layout to give LLMs cleaner structure with far less syntactic noise, especially for arrays of similar objects.

What TOON actually is

TOON stands for Token‑Oriented Object Notation, a compact, human‑readable serialization format designed to pass structured data to large language models with significantly reduced token usage. Conceptually, it encodes the same primitives as JSON-objects, arrays, strings, numbers, booleans-but in a syntax that removes most braces, brackets, quotes, and repeated keys that inflate token counts without adding semantic value for LLMs.

Unlike general‑purpose formats such as JSON, YAML, or XML, TOON is explicitly optimized for the economics and behavior of LLMs, where every extra character can become one or more tokens and directly translates into higher cost and latency. Benchmarks reported by the spec authors and early adopters show token reductions on the order of 30-60% versus pretty‑printed JSON for uniform arrays, while still remaining readable enough for humans to inspect and debug.

Core design principles

TOON’s design takes several deliberate bets that differ from traditional formats.

It is token‑efficient first, explicitly targeting lower token counts in LLM prompts rather than byte‑level compactness on the wire.
It uses indentation for structure like YAML, so nested objects no longer require curly braces and commas to indicate hierarchy.
It introduces tabular layouts for uniform arrays, declaring field names once as headers and then listing row values, similar to CSV but with explicit array semantics.
It keeps syntax minimal and predictable, which makes it easier for LLMs to parse deterministically and for downstream code to validate.

In effect, TOON is “JSON re‑imagined for AI”: the same conceptual data model but with punctuation and redundancy stripped away in favor of indentation and tabular structure that better match how models consume text.

TOON syntax by example

At the object level, TOON looks superficially similar to a whitespace‑sensitive subset of YAML.

A simple JSON object:

json
{
  "version": 1.0,
  "is_active": true,
  "user_id": 99
}

Might be represented in TOON as:

text
version: 1.0
is_active: true
user_id: 99

This style uses key: value pairs with indentation indicating nesting, without braces or commas.

Where TOON really diverges is in arrays of uniform objects, which are extremely common in LLM prompts (lists of records, tool calls, steps, messages, etc.). The TOON spec and introductory articles show an example employee list.

JSON:

json
{
  "employees": [
    { "eid": 1, "ename": "XYZ", "role": "admin" },
    { "eid": 2, "ename": "ABC", "role": "user" }
  ]
}

TOON:

text
employees{eid,ename,role}:
  1,EMP1,admin
  2,EMP2,user

Here employees{eid,ename,role}: declares an array named employees of length 2 with three fields per entry, followed by one row per employee using CSV‑style comma‑separated values. This avoids repeating field names for each object and eliminates brackets, braces, and quotes in most cases, cutting tokens dramatically while preserving clear structure for both code and models.

Why TOON matters for LLM workloads

In LLM systems, the cost of representing data isn’t about bytes on the wire-it’s about tokens inside the model. Every {, }, [, ], ", and repeated key name in JSON contributes tokens, and when you’re passing thousands of records or complex tool outputs into prompts, that overhead compounds quickly.

TOON developers report savings on the order of 30-60% tokens compared to pretty‑printed JSON and roughly similar or slightly higher token counts than CSV on purely flat tables, while retaining explicit structure that improves model reliability. In benchmarked extraction and reasoning tasks, TOON has even shown modest accuracy improvements over JSON for some models, likely because the structure is more regular and fields are clearly delineated with explicit lengths and headers.

For agentic workflows-chains of tools, vector lookups, planning steps-this token efficiency translates into lower cost, more room in context for actual content, and potentially faster responses, especially for large or repetitive datasets.

Comparing TOON vs JSON vs YAML

From a distance, TOON looks like a hybrid of JSON, YAML, and CSV, but it occupies a distinct role.

Syntax & readability

JSON uses curly braces for objects, square brackets for arrays, and mandatory quotes around string keys, producing a highly explicit but noisy syntax. YAML removes most punctuation, using indentation and a richer syntax with anchors, tags, and complex scalars, which can make it very readable but also easier to mis‑indent or misinterpret. TOON sticks to indentation and lightweight key: value or header‑plus‑rows notations, omitting advanced features and syntax in favor of regular, predictable structures that are easy for both humans and models to scan.

Data model & features

JSON’s data model is simple: objects, arrays, strings, numbers, booleans, and null, with no comments or advanced features, which keeps parsers small and behavior consistent. YAML supports a superset of JSON’s data types plus references, custom tags, and multiple document streams, and many YAML dialects allow comments and complex multi‑line strings, which is why it dominates configuration use cases like CI pipelines and infrastructure descriptors. TOON intentionally keeps close to JSON’s core model, adding only what’s needed to express nested objects and uniform arrays compactly, and avoids YAML’s more esoteric features that could confuse LLMs or introduce parsing ambiguities.

Token and size efficiency

Compact JSON (with whitespace removed) is already significantly more compact than pretty‑printed JSON, but it still repeats keys for each object in an array, which is wasteful in LLM prompts. YAML can appear visually compact because it hides structure in whitespace, but if you count actual characters-including required indentation-it often isn’t smaller than equivalent JSON and is sometimes larger. TOON reduces repeated keys via header‑style declarations and strips much punctuation, yielding about 30-60% fewer tokens than formatted JSON on uniform tabular data and modest reductions versus compact JSON in many AI‑centric scenarios.

Ecosystem and tooling

JSON is universally supported-browsers, databases, REST APIs, logging systems, and virtually every programming language ship with native JSON support and rich tooling for validation, schema management, and performance optimization. YAML enjoys strong support in DevOps ecosystems (Kubernetes manifests, GitHub Actions, Ansible, etc.), with mature parsers and linters, although implementations can vary and the standard has multiple versions. TOON is new, but libraries and encoders/decoders are rapidly appearing in ecosystems like JavaScript, Python, Elixir, R, and more, often piggybacking on existing JSON tooling and adding TOON as a translation layer specifically for LLM interactions.

Here is a high‑level comparison:

Aspect	TOON	JSON	YAML
Primary goal	Token‑efficient LLM data exchange	General‑purpose web/app data format	Human‑authored configuration
Structure style	Indentation + tabular headers	Braces, brackets, quotes	Indentation with rich syntax
Best at	Uniform arrays in prompts	APIs, storage, logging	Config files, manifests
Token efficiency (LLMs)	High, $$30%$$–$$60%$$ savings vs JSON on tabular data	Moderate; can be compact but still verbose for arrays	Similar or higher than JSON when whitespace counted
Ecosystem maturity	Emerging libraries and tools	Extremely mature and universal	Mature in DevOps/infrastructure
Human readability	High for tabular and nested objects	Clear but noisy punctuation	Very readable but more complex semantics

When TOON is better than JSON (and when it is not)

TOON is not a drop‑in replacement for JSON across all domains; it shines in specific LLM‑centric workflows.

Use TOON when:

Passing large, uniform arrays of objects-like transaction logs, product catalogs, or tool call instructions-into prompts where token costs matter.
You need a human‑readable but compact representation that agents or tools will round‑trip through LLMs frequently (e.g., plans, step lists, candidate actions).
You are building AI pipelines where JSON is used internally but converted to TOON at the boundary before hitting the model, acting as a translation layer.

Prefer JSON when:

Your data is deeply nested, highly irregular, or requires rich schema tooling, validation frameworks, and strong type guarantees.
You are designing public APIs, microservices, or browser‑facing interfaces where ubiquitous JSON support and standards like OpenAPI matter more than token counts.
The data never touches an LLM at all, or token cost is negligible relative to other system constraints.

Most TOON advocates recommend a hybrid approach: keep JSON as your system’s canonical format and only convert to TOON at the LLM boundary to optimize prompt size, especially for large or repetitive payloads.

Where YAML fits relative to TOON

YAML’s sweet spot has historically been configuration files, infrastructure manifests, and places where humans edit structured data directly-including Kubernetes manifests, GitHub Actions, and CI/CD pipelines. Its support for comments, anchors, and multi‑document streams makes it powerful but also more complex; parsers need to handle many edge cases, and mis‑indentation can lead to tricky bugs.

In AI workflows, YAML is sometimes used for model configs and prompt templates, but it is rarely used to represent large tabular datasets inside prompts because its whitespace‑encoded structure doesn’t meaningfully reduce token counts once fully serialized. TOON specifically targets that gap: it brings YAML‑style indentation to keep nested objects readable while adding CSV‑like tabular syntax for arrays that makes prompts shorter and more regular than either JSON or YAML alone.

In other words, YAML is still an excellent choice for human‑maintained configuration around your LLM systems, while TOON is a better fit for the high‑volume, structured payloads you send into and receive from the model itself.

Practical guidance for using TOON in real systems

For an AI agent or LLM‑heavy application, a pragmatic integration strategy usually looks like this.

Keep your internal domain models, APIs, and storage in JSON or whatever your platform already standardizes on, to leverage existing tooling and contracts.
At the boundary where you construct prompts, serialize data from JSON into TOON using a library in your language of choice, then embed that TOON block inside the prompt.
When parsing structured outputs from LLMs, encourage the model to respond in TOON for large arrays or structured summaries, then decode back to JSON for downstream processing.

Teams adopting TOON report that this “JSON inside, TOON at the LLM edge” pattern gives most of the token benefits with minimal disruption to existing code and contracts. As the ecosystem matures-with more converters, validators, and IDE support-TOON is likely to become a standard tool in the stack for anyone building serious agentic or data‑heavy AI systems where every token and millisecond counts.