Back Original

Arborium: Tree-sitter code highlighting with Native and WASM targets

arborium

Finding good tree-sitter grammars is hard. In arborium, every grammar:

  • Is generated with tree-sitter 0.26
  • Builds for WASM & native via cargo
  • Has working highlight queries

We hand-picked grammars, added missing highlight queries, and updated them to the latest tree-sitter. Tree-sitter parsers compiled to WASM need libc symbols (especially a C allocator)—we provide arborium-sysroot which re-exports dlmalloc and other essentials for wasm32-unknown-unknown.

Output formats

HTML — custom elements like <a-k> instead of <span class="keyword">. More compact markup. No JavaScript required.

Traditional <span class="keyword">fn</span>

arborium <a-k>fn</a-k>

ANSI — 24-bit true color for terminal applications.

Platforms

macOS, Linux, Windows — tree-sitter handles generating native crates for these platforms. Just add the dependency and go.

WebAssembly — that one's hard. Compiling Rust to WASM with C code that assumes a standard library is tricky. We provide a sysroot that makes this work, enabling Rust-on-the-frontend scenarios like this demo.

Get Started

Rust (native or WASM)

Add to your Cargo.toml:

arborium = { version = "2", features = ["lang-rust"] }

Then highlight code:

let html = arborium::highlight("rust", source)?;

Script tag (zero config)

Add this to your HTML and all <pre><code> blocks get highlighted automatically:

<script src="https://cdn.jsdelivr.net/npm/@arborium/arborium@1/dist/arborium.iife.js"></script>

Your code blocks should look like this:

<pre><code class="language-rust">fn main() {}</code></pre>
<!-- or -->
<pre><code data-lang="rust">fn main() {}</code></pre>
<!-- or just let it auto-detect -->
<pre><code>fn main() {}</code></pre>

Configure via data attributes:

<script src="..."
  data-theme="github-light"      <!-- theme name -->
  data-selector="pre code"        <!-- CSS selector -->
  data-manual                     <!-- disable auto-highlight -->
  data-cdn="unpkg"></script>       <!-- jsdelivr | unpkg | custom URL -->

With data-manual, call window.arborium.highlightAll() when ready.

See the IIFE demo →

npm (ESM)

For bundlers or manual control:

import { loadGrammar, highlight } from '@arborium/arborium';

const html = await highlight('rust', sourceCode);

Grammars are loaded on-demand from jsDelivr (configurable).

Integrations

Your crate docs

Highlight TOML, shell, and other languages in your rustdoc. Create arborium-header.html:

<script defer src="https://cdn.jsdelivr.net/npm/@arborium/arborium@1/dist/arborium.iife.js"></script>

Then in Cargo.toml:

[package.metadata.docs.rs]
rustdoc-args = ["--html-in-header", "arborium-header.html"]

See it in action

docs.rs team

If you maintain docs.rs or rustdoc, you could integrate arborium directly! Either merge this PR for native rustdoc support, or use arborium-rustdoc as a post-processing step:

# Process rustdoc output in-place
arborium-rustdoc ./target/doc ./target/doc-highlighted

It streams through HTML, finds <pre class="language-*"> blocks, and highlights them in-place. Works with rustdoc's theme system.

crates.io · docs.rs · See it in action!

miette-arborium

Syntax highlighting for miette error diagnostics. Beautiful, accurate highlighting in your CLI error messages.

use miette::GraphicalReportHandler;
use miette_arborium::ArboriumHighlighter;

let handler = GraphicalReportHandler::new()
    .with_syntax_highlighting(ArboriumHighlighter::new());

crates.io · docs.rs

dodeca dodeca

An incremental static site generator with zero-reload live updates via WASM DOM patching, Sass/SCSS, image processing, font subsetting, and arborium-powered syntax highlighting.

Nothing to configure—it just works. Arborium is built in and automatically highlights all code blocks.

Website · GitHub

Languages

96 languages included, each behind a feature flag. Enable only what you need, or use all-languages for everything.

Each feature flag comment includes the grammar's license, so you always know what you're shipping.

Theme support

The highlighter supports themes for both HTML and ANSI output.

Bundled themes:

fn main() {
    let x = 42;
    println!("Hello");
}

Alabaster

fn main() {
    let x = 42;
    println!("Hello");
}

Ayu Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Ayu Light

fn main() {
    let x = 42;
    println!("Hello");
}

Catppuccin Frappé

fn main() {
    let x = 42;
    println!("Hello");
}

Catppuccin Latte

fn main() {
    let x = 42;
    println!("Hello");
}

Catppuccin Macchiato

fn main() {
    let x = 42;
    println!("Hello");
}

Catppuccin Mocha

fn main() {
    let x = 42;
    println!("Hello");
}

Cobalt2

fn main() {
    let x = 42;
    println!("Hello");
}

Dayfox

fn main() {
    let x = 42;
    println!("Hello");
}

Desert256

fn main() {
    let x = 42;
    println!("Hello");
}

Dracula

fn main() {
    let x = 42;
    println!("Hello");
}

EF Melissa Dark

fn main() {
    let x = 42;
    println!("Hello");
}

GitHub Dark

fn main() {
    let x = 42;
    println!("Hello");
}

GitHub Light

fn main() {
    let x = 42;
    println!("Hello");
}

Gruvbox Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Gruvbox Light

fn main() {
    let x = 42;
    println!("Hello");
}

Kanagawa Dragon

fn main() {
    let x = 42;
    println!("Hello");
}

Light Owl

fn main() {
    let x = 42;
    println!("Hello");
}

Lucius Light

fn main() {
    let x = 42;
    println!("Hello");
}

Melange Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Melange Light

fn main() {
    let x = 42;
    println!("Hello");
}

Monokai

fn main() {
    let x = 42;
    println!("Hello");
}

Nord

fn main() {
    let x = 42;
    println!("Hello");
}

One Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Rosé Pine Moon

fn main() {
    let x = 42;
    println!("Hello");
}

Rustdoc Ayu

fn main() {
    let x = 42;
    println!("Hello");
}

Rustdoc Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Rustdoc Light

fn main() {
    let x = 42;
    println!("Hello");
}

Solarized Dark

fn main() {
    let x = 42;
    println!("Hello");
}

Solarized Light

fn main() {
    let x = 42;
    println!("Hello");
}

Tokyo Night

fn main() {
    let x = 42;
    println!("Hello");
}

Zenburn

Custom themes can be defined programmatically using RGB colors and style attributes (bold, italic, underline, strikethrough).

Grammar Sizes

Each grammar includes the full tree-sitter runtime embedded in its WASM module. This adds a fixed overhead to every grammar bundle, on top of the grammar-specific parser tables.

Smallest -

Average -

Largest -

Total -

Language C Lines Size Distribution

WASM Build Pipeline

Every grammar is compiled to WASM with aggressive size optimizations. Here's the complete build pipeline:

1. cargo build

We compile with nightly Rust using -Zbuild-std to rebuild the standard library with our optimization flags:

-Cpanic=immediate-abort Skip unwinding machinery

-Copt-level=s Optimize for size, not speed

-Clto=fat Full link-time optimization across all crates

-Ccodegen-units=1 Single codegen unit for maximum optimization

-Cstrip=symbols Remove debug symbols

2. wasm-bindgen

Generate JavaScript bindings with --target web for ES module output.

3. wasm-opt

Final size optimization pass with Binaryen's optimizer:

-Oz Aggressive size optimization

--enable-bulk-memory Faster memory operations

--enable-mutable-globals Required for wasm-bindgen

--enable-simd SIMD instructions where applicable

Despite all these optimizations, WASM bundles are still large because each one embeds the full tree-sitter runtime. We're exploring ways to share the runtime across grammars, but that's the architecture trade-off for now.

FAQ

Why not highlight.js or Shiki?

Those use regex-based tokenization (TextMate grammars). Regexes can't count brackets, track scope, or understand structure—they just pattern-match.

Tree-sitter actually parses your code into a syntax tree, so it knows that fn is a keyword only in the right context, handles deeply nested structures correctly, and recovers gracefully from syntax errors.

IDEs with LSP support (like rust-analyzer) can do even better with semantic highlighting—they understand types and dependencies across files—but tree-sitter gets you 90% of the way there without needing a full language server.

Why the name "arborium"?

Arbor is Latin for tree (as in tree-sitter), and -ium denotes a place or collection (like aquarium, arboretum).

It's a place where tree-sitter grammars live.

I have a grammar that's not included. Can you add it?

Yes! Open an issue on the repo with a link to the grammar.

We'll review it and add it if the grammar and highlight queries are in good shape.

Why not use the WASM builds from tree-sitter CLI?

When doing full-stack Rust, it's nice to have exactly the same code on the frontend and the backend.

Rust crates compile to both native and WASM, so you get one dependency that works everywhere.

Why are tree-sitter parsers so large?

Tree-sitter uses table-driven LR parsing. The grammar compiles down to massive state transition tables—every possible parser state and every possible token gets an entry.

These tables are optimized for O(1) lookup speed, not size. A complex grammar like TypeScript can have tens of thousands of states.

The tradeoff is worth it: you get real parsing (not regex hacks) that handles edge cases correctly and recovers gracefully from syntax errors.