PMTiles is a binary container format that efficiently packs tiled geospatial data into a single file. It embeds a read-only virtual filesystem that is designed for memory-efficient tile retrieval and can be queried either remotely (via HTTP Range requests) or locally (via file seeks and reads). All tile data within the file is highly compressed; it only takes ~125GB of hard drive space to store a map of Earth with street-level resolution.
I’m fascinated by this format for two reasons:
- Personal: I’ve been building out my homelab and reflecting on my usage of SaaS apps and 3rd-party services. My primary forms of transport are walking / cycling / riding the train with the occasional car trip as needed. I’m running an experiment to see if I can reduce my Google Maps usage with self-hosted mapping tools.
- Professional: My background is in data engineering, however I’m new to the world of geospatial data. Innovations in analytics-focused data formats like Parquet and Arrow have lead to a Cambrian explosion of practical downstream applications. Similarly, PMTiles feels like an innovation that unlocks some unique applications. I want to understand how it works so that I can both a) be a better user of the format and b) learn practical techniques for how to innovate on data formats more generally.
This post is a synthesis of what I've learned while deep-diving the format by reading Useful Resources, analyzing hexdumps of archives, and hacking on my own tile-based map renderer. I’ve created diagrams to clarify my own thinking, and i’ll continue to update as I learn more. Hope it’s useful!
Imagine there is a spacecraft far, far away that has captured a collection of high resolution images of a planet’s surface. While there is plenty of rigorous analysis that could be done on these images, they also present an opportunity to get an intuitive feel for the surface. And what better way to explore than clicking-and-dragging on a digital map? (Unfortunately no rovers have yet been deployed to the surface, so no street view!)
A little planet in the far reaches of space... beautiful, isn’t it?
The process of transforming the images into a zoomable map takes a couple of steps.
Projecting the planet’s surface data
The first step is to use a map projection to flatten the planet’s high resolution 3D surface to a 2D plane. There is no perfect way to flatten the planet without distortion, so choosing which map projection to use is matter of choosing what distortions are acceptable for the given use case. (NASA has a neat visualization showing different types of projections).
A standard in the world of earth-based digital mapping (Google Maps, OpenStreetMap) is the Web Mercator projection, so we’ll use that. Navigating the planet using this projection will feel familiar however it has downsides, namely landmass distortions and clipping of the poles at +/- 85.05 degrees latitude. That’s ok for now... we can always use a different projection later to get a different perspective.
Projection of the planet’s surface. In reality, this image would be enormous due to the high-resolution and have too many pixels to show on the screen.
Chopping up the surface into tiles at different resolutions
The next step is to chop up the high-resolution 2D projection into fixed-size regions of surface (tiles) within a grid and aggregate at different levels of resolution (zoom levels).
The grid’s dimensions are determined by the zoom level ($Z$) of the grid: $2^z \times 2^z$ tiles. Tiles can be identified on the grid using a 3-tuple: $\text{Z}/\text{X}/ \text{Y}$. This system of tiling geographic regions into an addressable rectangular grid is commonly referred to as a slippy map.
There are two broad categories of tiles, those that contain pixels (raster) and those that contain geometric features (vector). This particular spacecraft relies on optical instruments that generate raster tiles:
The planet’s surface aggregated at the first 3 zoom levels.
The tile grid has an origin in the top-left, meaning that each tile can be referenced by it’s location on the grid:
A few tiles and their Z/X/Y coordinates.
Stacking slices into a pyramid
The final step is to stack these multi-resolution slices into a single structure. The tile dimensions stay constant across zoom levels, typically 256x256 or 512x512 pixels. As the zoom level increases, there will be more resolvable details in the tiles. Each tile is replaced by 4 tiles covering the same geographic region in the next zoom level. This stack of zoom levels from lowest to highest resolution is called a tile pyramid.
Tile pyramid for the planet's first 3 zoom levels.
Navigating a digital slippy map is equivalent to moving around on the tile pyramid
- Zooming: jump between layers of the tile pyramid
- Panning: traversing tiles within a layer
These tile pyramids grow exponentially as zoom level increases.
$$ \begin{aligned} \text{\# of tiles at zoom} &= 4^Z \\ \text{\# of tiles in pyramid} &= \sum_{z=Z_{min}}^{Z_{max}} 4^z \end{aligned} $$
So while the $Z=2$ layer contains 16 tiles, the $Z=20$ layer contains ~1 trillion tiles. And since we’re storing all layers of the pyramid, the cumulative tile count for $0 \leq Z \leq 20$ is ~1.4 trillion tiles!
Storing these massive tile pyramids is exactly what PMTiles was designed to do.
File Layout
A file consists of a header, several sections for indexing the tile pyramid, and a single section containing all tiles in the pyramid.
It’s helpful to look at a schematic of the format from two different views, the section layout (left) and the compression layout (right).
Two views of the same PMTiles archive.
A few things to note:
- The header is 127 bytes while the root directory’s maximum size is 16,257 bytes. Both of these sections can be decoded in a single read.
- The format compresses both its own internal metadata (via Internal Compression) and the tile data (via Tile Compression). These two types of compression may differ.
- Each section is compressed independently of other sections.
- Aside from the header, all sections are variable length. Decoding them requires knowing where they start and how long they are.
- The tile bytes are stored separately from the tile pyramid indexing structures.
The Virtual Filesystem
Entries and the Root Directory
Tiles within an archive are indexed using a virtual filesystem that sorts its entries by zoom level and spatial proximity. The filesystem does not store raw tile bytes directly, but rather holds pointers to the compressed tile bytes that are stored within the Tile Data section.
PMTiles defines an entry as four fields:
- Tile ID: position along a Hilbert curve
- Run Length: number of times the tile repeats;
0is a special value that indicates the entry is a directory. - Offset: offset relative to the Tile Data or Leaf Directories section's offset
- Length: number of bytes in the compressed tile or directory data
Note that entries are either tiles or directories depending on the value of Run Length. If the entry is a tile, then Offset and Length define a pointer to compressed tile bytes in the Tile Data section. Otherwise if the entry is a directory, then they define a pointer to compressed directory bytes in the Leaf Directories section.
At the base of the filesystem is the root directory, which contains pointers to tiles and/or leaf directories. Directories are required to have at least one entry, meaning that the root directory must contain at least one tile entry or one leaf directory entry.
Spatial Sorting via the Hilbert Curve
There are many approaches to sorting by proximity in a 2D space, each with tradeoffs around computational complexity and preservation of spatial locality. PMTiles’ approach is to construct a specific type of space-filling curve, called the Hilbert Curve, for each zoom level. These curves are then connected end-to-end by ascending zoom level such that there is one connected curve for the entire tile pyramid. (Both 3Blue1Brown and Numberphile have excellent videos describing the theory behind Hilbert Curves.)
The Hilbert curve and corresponding tile ordering for three zoom levels.
With the sort order well-defined, we can now deterministically serialize and deserialize tile pyramids. But why go through all this trouble of sorting entries in Hilbert order? A couple reasons, but the one we’ll talk about next is compression via tile deduplication.
Tile Deduplication
Take a look at the process for encoding the planet’s $Z=2$ zoom level:
Deduplicating tiles on the Hilbert curve for Z = 2.
Notice that many of the tiles appear to repeat after the 2D plane is unrolled? That’s not a coincidence... it’s a benefit of the spatial locality of the Hilbert curve. In regions with low geographic variation, like oceans, neighboring tiles are more likely to be the same. These repeating tiles are run-length encoded to reduce the number of tiles that physically need to be stored in the file. So instead of storing 16 tiles for $Z=2$, only 6 tiles need to be stored.
Deduplication has a significant space savings as the zoom level increases. For example: if an ocean tile splits into 4 identical ocean tiles in the next layer which then split to 16 tiles in the following layer, the duplicates at each layer can be compressed to one tile.
Here's what the planet's filesystem would look like if all tiles were in the root directory:
Serialization of the planet’s deduplicated tile pyramid in the Root Directory.
Leaf directories to the rescue
In order to read an archive, both the header and root directory must be fetched at the start. This is no problem with 3 zoom levels but becomes prohibitively expensive at the higher zoom levels. Given that a virtual filesystem representing Earth could contain trillions of tiles, fetching all the entries for all these tiles at once is a dead end.
PMTiles uses leaf directories to mitigate this. Instead of storing all the tile entries directly in the root directory, groups of tile entries are stored in the leaf directories section and the root contains pointers to these groups. So instead of eagerly loading trillions of tile entries, PMTiles readers eagerly load the root directory's tile entries at startup and lazily load additional tile entries within leaf directories at query time.
Here’s what it looks like for the planet’s tiles when split into one leaf directory per zoom level:
The planet’s tile pyramid if each zoom level has it’s own leaf directory.
The spec does not impose any constraints on leaf directory construction other than strongly discouraging nested leaf directories. Making each zoom level a leaf was an arbitrary choice for this scenario. Archives can structure their directories in ways that best support their intended use.
So far we’ve gone over the logical representation of the filesystem, but now it’s time to mention how it is physically stored on disk.
Encoding the filesystem
Directories are encoded as a columnar serialization of all contained entries, prefixed by the entry count. This serialization is then compressed with optional layer of internal compression as indicated in the header.
Columnar representation of a directory.
The simplest way to serialize the entries would be to append the raw bytes together. However PMTiles takes advantage of the properties of the data to provide additional compression.
The building block: ULEB128
ULEB128, aka varint encoding, shows up frequently when working with binary data. So what is it and how does it work?
Let's say that there is a positive number where we know the max value fits into a 64-bit unsigned integer, as is the case with offsets and lengths in the spec. We could serialize a bunch of these numbers by always writing 64-bits to the output to represent the number, but this would be a huge waste of space for storing numbers that are less than 64-bits... the unused bytes would be padded with zeros. ULEB128 stores unsigned integers so that they only take up the space that they need.
Encoding a number as ULEB128.
Interestingly enough, encoding an integer as ULEB128 does not always make the number smaller. For example: the ULEB128 encoding of the 1,000,000,000 is 5 bytes, but the number can be represented in 4 bytes without the encoding. However because we otherwise would have had to store the value as a 64-bit integer to account for the max possible value, we saved 3 bytes. The smaller the number that is encoded, the more space savings we gain from having avoided the fixed-width integer cost.
In the following sections, keep in mind that every integer mentioned is encoded via ULEB128 with additional context-specific encodings layered on top.
Tile IDs
Tile IDs are monotonically increasing unsigned integers. Instead of storing each ID directly, PMTiles stores the first ID in the directory as the reference and then stores deltas between contiguous IDs.
For example, take the planet’s $Z = 2$ deduplicated tile data sequence of 5[2], 7[1], 8[4], 12[1], 13[1], 14[7]. Instead of storing the TIle IDs as 5, 7, 8, 12, 13, 14, we store them as 5, 2, 1, 4, 1, 1. At first glance this doesn’t seem to do anything for compression... every number in both sequences only takes one byte to store. But let's say we needed to store a sequence of Tile IDs starting from ID = 1,000,000,000:
Using delta encoding on a sequence of four Tile IDs.
Using delta encoding on this sequence of IDs reduces the storage space from 20 bytes to 8 bytes.
Run Lengths
Run lengths are encoded as-is.
Lengths
Lengths are encoded as-is.
Offsets
Offsets are encoded such that the initial offset is stored as <offset> + 1 and subsequent contiguous offsets are stored recorded as a marker byte 00, which indicates the next tile's offset is the previous tile's offset + length. In the case where the contiguousness is disrupted, the full offset is stored again and marker bytes are placed until either the next disruption or the end of the directory.
An example of offset encodings for a directory.
Full encoding
And with that, we have enough information to write out the fully encoded filesystem. The planet’s encoding of 1 zoom level per leaf looks like this:
Encoding of planet's root directory.
Encoding of planet's leaf directories.
Note that none indicates that the directories don't have another layer of compression applied.
Tile Data
Recall from the file layout that tiles are stored at the end of the file. Each tile’s bytes are optionally wrapped in tile compression as indicated in the header. Entries in the virtual filesystem point to regions within this section.
A side-by-side view of a clustered and unclustered archive for the planet's Z=2 leaf.
If the tile bytes in the data section are sorted the same way as the directory entries, then the archive is considered to be clustered. And since the Hilbert curve ordering mostly preserves the spacial locality, clustered archives are efficient at fetching ranges of tiles within a geographic region.
Note that the planet's tiles are stored in TILE DATA as a gzipped PNGs concatenated together.
The metadata section contains a single JSON document whose contents depend on the tile type. For vector tiles, the JSON is required to have a couple fields that describe the structure vector data within tiles. No fields are required for raster tiles.
Note that other non-specified fields can be added to the Metadata. These fields can be added for informational purposes or even to enable custom processing of the archive. For example: let's say we want the tiles to be encoded using a non-supported format like SVG. A specialized writer could mark Tile Type as Unknown/Other in the header and add a custom “tile_type”: “svg" key-value pair in the JSON. A specialized reader could then use this metadata to render SVG tiles even though they are not directly supported by PMTiles.
The header contains the details needed to decode the rest of the file. The following are details of what the planet's header would look like. For more details on what each field means, see the spec:
| Field | Human-readable | Encoded |
|---|---|---|
| Magic Number | PMTiles | 50 4d 54 69 6c 65 73 |
| Version | 3 | 03 |
| Root Directory Offset | 127 | 7f 00 00 00 00 00 00 00 |
| Root Directory Length | 13 | 0d 00 00 00 00 00 00 00 |
| Metadata Offset | 140 | 8c 00 00 00 00 00 00 00 |
| Metadata Length | 2 | 02 00 00 00 00 00 00 00 |
| Leaf Directories Offset | 142 | 8e 00 00 00 00 00 00 00 |
| Leaf Directories Length | 61 | 3d 00 00 00 00 00 00 00 |
| Tile Data Offset | 203 | cb 00 00 00 00 00 00 00 |
| Tile Data Length | 41453 | ed a1 00 00 00 00 00 00 |
| Number of Addressed Tiles | 21 | 15 00 00 00 00 00 00 00 |
| Number of Tile Entries | 11 | 0b 00 00 00 00 00 00 00 |
| Number of Tile Contents | 11 | 0b 00 00 00 00 00 00 00 |
| Clustered | true | 01 |
| Internal Compression | none | 01 |
| Tile Compression | gzip | 02 |
| Tile Type | png | 02 |
| Min Zoom | 0 | 00 |
| Max Zoom | 2 | 02 |
| Min Position | longitude = -180 | 00 2e b6 94 40 3a 4e cd |
| Max Position | longitude = 180 | 00 d2 49 6b c0 c5 b1 32 |
| Center Zoom | 1 | 01 |
| Center Position | longitude = 0 | 00 00 00 00 00 00 00 00 |
And now it’s time to create an actual PMTiles archive from the planet’s tiles. Throughout this post, we've been slowly building up the contents of the planet's archive section-by-section. The only step remaining is to assemble all the sections into a file.
I wrote a bash script that does the byte manipulation. (To follow along, you can download the gzipped tiles PNGs(0,1,2,3,4,5,7,8,12,13,14) and place them in a subdirectory called "tiles"):
Code
output_file=planet.pmtiles
> $output_file
printf '\x50\x4d\x54\x69\x6c\x65\x73' >> $output_file
printf '\x03' >> $output_file
printf '\x7f\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x0d\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x8c\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x02\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x8e\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x3d\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\xcb\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\xed\xa1\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x15\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x0b\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x0b\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x01' >> $output_file
printf '\x01' >> $output_file
printf '\x02' >> $output_file
printf '\x02' >> $output_file
printf '\x00' >> $output_file
printf '\x02' >> $output_file
printf '\x00\x2e\xb6\x94\x40\x3a\x4e\xcd' >> $output_file
printf '\x00\xd2\x49\x6b\xc0\xc5\xb1\x32' >> $output_file
printf '\x01' >> $output_file
printf '\x00\x00\x00\x00\x00\x00\x00\x00' >> $output_file
printf '\x03\x00\x01\x04\x00\x00\x00\x06\x16\x21\x01\x00\x00' >> $output_file
printf '{}' >> $output_file
printf '\x01\x00\x01\x8d\x23\x01' >> $output_file
printf '\x04\x01\x01\x01\x01\x01\x01\x01\x01\xee\x1f\xe1\x1c\xa9\x1f\xdd\x17\x8e\x23\x00\x00\x00' >> $output_file
printf '\x06\x05\x02\x01\x04\x01\x01\x02\x01\x04\x01\x01\x07\xdd\x17\x94\x22\xdd\x17\x9a\x21\xc5\x22\xde\x17\xe3\x96\x01\x00\x00\x00\x00\x00' >> $output_file
cat tiles/0.png.gz tiles/1.png.gz tiles/2.png.gz tiles/3.png.gz tiles/4.png.gz tiles/5.png.gz tiles/7.png.gz tiles/8.png.gz tiles/12.png.gz tiles/13.png.gz tiles/14.png.gz >> $output_file
Using the official pmtiles CLI tool, we can check that the archive header and virtual filesystem are valid.
> pmtiles show planet.pmtiles
pmtiles spec version: 3
tile type: png
bounds: (long: -180.000000, lat: -85.051130) (long: 180.000000, lat: 85.051130)
min zoom: 0
max zoom: 2
center: (long: 0.000000, lat: 0.000000)
center zoom: 1
addressed tiles count: 21
tile entries count: 11
tile contents count: 11
clustered: true
internal compression: none
tile compression: gzip> pmtiles verify planet.pmtiles
2026/03/14 14:12:26 verify.go:169: Completed verify in 661.875µs.
Note that the pmtiles command does not validate tile data. So the real test is loading the file via an existing map renderer like MapLibre:
Definitely not the prettiest map... but it works! This is a traversable tile pyramid based on the PNGs that were stuffed into the archive.
Hardcoding the values in a bash script is fun, but it's also error-prone and a little tedious to work with. I wrote a small zig program that generates valid PMTiles archives from a folder of Hilbert-ordered tiles and a few configuration options. It’s not a general-purpose writer, but good for experimenting with file layout.
Code
const std = @import("std");
const OUTPUT_FILE_PATH = "zig_planet.pmtiles";
const FILESYSTEM_SPEC = input.ZOOM_PARTITIONED_FILESYSTEM;
pub fn main() !void {
var da: std.heap.DebugAllocator(.{}) = .init;
defer _ = da.deinit();
NOTE
var arena = std.heap.ArenaAllocator.init(da.allocator());
defer arena.deinit();
var archive_file = try std.fs.cwd().createFile(OUTPUT_FILE_PATH, .{ .lock = .exclusive });
defer archive_file.close();
var writer_buf: [4 * 1024]u8 = undefined;
var archive_writer = archive_file.writerStreaming(&writer_buf);
const vfs: pmtiles.VirtualFileSystem = try .init(arena.allocator(), FILESYSTEM_SPEC);
const header: pmtiles.Header = .init(
vfs,
.none,
.gzip,
.png,
0,
2,
.{ .latitude = -85.051129, .longitude = -180 },
.{ .latitude = 85.051129, .longitude = 180 },
1,
.{ .longitude = 0, .latitude = 0 },
);
try header.write(&archive_writer.interface);
try archive_writer.interface.writeAll(vfs.root_directory_bytes);
try archive_writer.interface.writeAll(input.METADATA);
for (vfs.leaf_directories_bytes) |bytes| {
try archive_writer.interface.writeAll(bytes);
}
for (FILESYSTEM_SPEC) |entries| {
for (entries) |entry| {
switch (entry) {
.tile => |t| try archive_writer.interface.writeAll(t.bytes),
.directory => {},
}
}
}
try archive_writer.interface.flush();
}
const input = struct {
const Tile = struct {
id: u64,
run_length: u64,
bytes: []const u8,
fn embed(id: u64, run_length: u64) Tile {
const id_str = std.fmt.comptimePrint("{d}", .{id});
return .{ .id = id, .run_length = run_length, .bytes = @embedFile("tiles/" ++ id_str ++ ".png.gz") };
}
};
const Directory = struct { min_tile_id: u64 };
const Entry = union(enum) {
tile: Tile,
directory: Directory,
};
NOTE
const ARBITRARILY_PARTITIONED_FILESYSTEM: []const []const Entry = &.{
&.{
.{ .tile = Tile.embed(0, 1) },
.{ .tile = Tile.embed(1, 1) },
.{ .tile = Tile.embed(2, 1) },
.{ .tile = Tile.embed(3, 1) },
.{ .tile = Tile.embed(4, 1) },
.{ .directory = .{ .min_tile_id = 5 } },
},
&.{
.{ .tile = Tile.embed(5, 2) },
.{ .tile = Tile.embed(7, 1) },
.{ .tile = Tile.embed(8, 4) },
.{ .tile = Tile.embed(12, 1) },
.{ .tile = Tile.embed(13, 1) },
.{ .tile = Tile.embed(14, 7) },
},
};
const ZOOM_PARTITIONED_FILESYSTEM: []const []const Entry = &.{
&.{
.{ .directory = .{ .min_tile_id = 0 } },
.{ .directory = .{ .min_tile_id = 1 } },
.{ .directory = .{ .min_tile_id = 5 } },
},
&.{
.{ .tile = Tile.embed(0, 1) },
},
&.{
.{ .tile = Tile.embed(1, 1) },
.{ .tile = Tile.embed(2, 1) },
.{ .tile = Tile.embed(3, 1) },
.{ .tile = Tile.embed(4, 1) },
},
&.{
.{ .tile = Tile.embed(5, 2) },
.{ .tile = Tile.embed(7, 1) },
.{ .tile = Tile.embed(8, 4) },
.{ .tile = Tile.embed(12, 1) },
.{ .tile = Tile.embed(13, 1) },
.{ .tile = Tile.embed(14, 7) },
},
};
const ROOT_ONLY_FILESYSTEM: []const []const Entry = &.{
&.{
.{ .tile = Tile.embed(0, 1) },
.{ .tile = Tile.embed(1, 1) },
.{ .tile = Tile.embed(2, 1) },
.{ .tile = Tile.embed(3, 1) },
.{ .tile = Tile.embed(4, 1) },
.{ .tile = Tile.embed(5, 2) },
.{ .tile = Tile.embed(7, 1) },
.{ .tile = Tile.embed(8, 4) },
.{ .tile = Tile.embed(12, 1) },
.{ .tile = Tile.embed(13, 1) },
.{ .tile = Tile.embed(14, 7) },
},
};
const METADATA = "{}";
};
const pmtiles = struct {
const ROOT_DIRECTORY_START_OFFSET = 127;
const Header = struct {
magic_number: *const [7:0]u8 = "PMTiles",
version: u8 = 0x03,
root_directory_offset: u64,
root_directory_length: u64,
metadata_offset: u64,
metadata_length: u64,
leaf_directories_offset: u64,
leaf_directories_length: u64,
tile_data_offset: u64,
tile_data_length: u64,
number_of_addressed_tiles: u64,
number_of_tile_entries: u64,
number_of_tile_contents: u64,
clustered: u8 = @intFromBool(true),
internal_compression: Compression,
tile_compression: Compression,
tile_type: TileType,
min_zoom: u8,
max_zoom: u8,
min_position: Position,
max_position: Position,
center_zoom: u8,
center_position: Position,
const Compression = enum(u8) {
unknown,
none,
gzip,
brotli,
zstd,
};
const TileType = enum(u8) {
unknown,
mvt_vector_tile,
png,
jpeg,
webp,
avif,
maplibre_vector_tile,
};
const Position = struct {
longitude: f32,
latitude: f32,
fn write(self: *const Position, writer: *std.io.Writer) !void {
try writer.writeInt(i32, @intFromFloat(self.longitude * 10_000_000), .little);
try writer.writeInt(i32, @intFromFloat(self.latitude * 10_000_000), .little);
}
};
fn init(
vfs: VirtualFileSystem,
internal_compression: Compression,
tile_compression: Compression,
tile_type: TileType,
min_zoom: u8,
max_zoom: u8,
min_position: Position,
max_position: Position,
center_zoom: u8,
center_position: Position,
) Header {
const metadata_offset = pmtiles.ROOT_DIRECTORY_START_OFFSET + vfs.root_directory_length;
const leaf_directories_offset = metadata_offset + input.METADATA.len;
const tile_data_offset = leaf_directories_offset + vfs.leaf_directories_length;
return .{
.root_directory_offset = pmtiles.ROOT_DIRECTORY_START_OFFSET,
.root_directory_length = vfs.root_directory_length,
.metadata_offset = metadata_offset,
.metadata_length = input.METADATA.len,
.leaf_directories_offset = leaf_directories_offset,
.leaf_directories_length = vfs.leaf_directories_length,
.tile_data_offset = tile_data_offset,
.tile_data_length = vfs.tile_data_length,
.number_of_addressed_tiles = vfs.statistics.number_of_addressed_tiles,
.number_of_tile_entries = vfs.statistics.number_of_tile_entries,
.number_of_tile_contents = vfs.statistics.number_of_tile_contents,
.internal_compression = internal_compression,
.tile_compression = tile_compression,
.tile_type = tile_type,
.min_zoom = min_zoom,
.max_zoom = max_zoom,
.min_position = min_position,
.max_position = max_position,
.center_zoom = center_zoom,
.center_position = center_position,
};
}
fn write(self: *const Header, writer: *std.io.Writer) !void {
try writer.writeAll(self.magic_number);
try writer.writeInt(u8, self.version, .little);
try writer.writeInt(u64, self.root_directory_offset, .little);
try writer.writeInt(u64, self.root_directory_length, .little);
try writer.writeInt(u64, self.metadata_offset, .little);
try writer.writeInt(u64, self.metadata_length, .little);
try writer.writeInt(u64, self.leaf_directories_offset, .little);
try writer.writeInt(u64, self.leaf_directories_length, .little);
try writer.writeInt(u64, self.tile_data_offset, .little);
try writer.writeInt(u64, self.tile_data_length, .little);
try writer.writeInt(u64, self.number_of_addressed_tiles, .little);
try writer.writeInt(u64, self.number_of_tile_entries, .little);
try writer.writeInt(u64, self.number_of_tile_contents, .little);
try writer.writeInt(u8, self.clustered, .little);
try writer.writeInt(u8, @intFromEnum(self.internal_compression), .little);
try writer.writeInt(u8, @intFromEnum(self.tile_compression), .little);
try writer.writeInt(u8, @intFromEnum(self.tile_type), .little);
try writer.writeInt(u8, self.min_zoom, .little);
try writer.writeInt(u8, self.max_zoom, .little);
try self.min_position.write(writer);
try self.max_position.write(writer);
try writer.writeInt(u8, self.center_zoom, .little);
try self.center_position.write(writer);
}
};
const VirtualFileSystem = struct {
root_directory_length: u64,
root_directory_bytes: []const u8,
leaf_directories_length: u64,
leaf_directories_bytes: [][]const u8,
tile_data_length: u64,
statistics: Statistics,
const Statistics = struct {
number_of_addressed_tiles: u64,
number_of_tile_entries: u64,
number_of_tile_contents: u64,
};
const Location = struct { offset: u64, length: u64 };
const Entry = struct {
id: u64,
run_length: u64,
offset: u64,
length: u64,
};
fn init(
arena: std.mem.Allocator,
spec: []const []const input.Entry,
) !VirtualFileSystem {
var tile_locations: std.AutoArrayHashMapUnmanaged(u64, Location) = .empty;
var leaf_directory_locations: std.AutoArrayHashMapUnmanaged(u64, Location) = .empty;
NOTE
var number_of_addressed_tiles: u64 = 0;
var number_of_tile_entries: u64 = 0;
var number_of_tile_contents: u64 = 0;
var tile_data_length: u64 = 0;
for (spec) |entries| {
for (entries) |entry| {
switch (entry) {
.directory => {},
.tile => |t| {
number_of_addressed_tiles += t.run_length;
number_of_tile_contents += 1;
number_of_tile_entries += 1;
try tile_locations.putNoClobber(
arena,
t.id,
.{ .offset = tile_data_length, .length = t.bytes.len },
);
tile_data_length += t.bytes.len;
},
}
}
}
var leaf_directories_bytes: std.ArrayList([]const u8) = .empty;
var leaf_directories_length: u64 = 0;
for (spec[1..]) |input_entries| {
var leaf_entries: std.ArrayList(Entry) = .empty;
for (input_entries) |entry| {
switch (entry) {
.tile => |t| {
const loc = tile_locations.get(t.id) orelse return error.TileLocationNotFound;
try leaf_entries.append(arena, .{
.id = t.id,
.run_length = t.run_length,
.offset = loc.offset,
.length = loc.length,
});
},
.directory => unreachable,
}
}
const leaf_directory_bytes = try encodeDirectory(arena, leaf_entries.items, .none);
try leaf_directories_bytes.append(arena, leaf_directory_bytes);
const min_tile_id = switch (input_entries[0]) {
.tile => |t| t.id,
.directory => unreachable,
};
try leaf_directory_locations.put(arena, min_tile_id, .{
.offset = leaf_directories_length,
.length = leaf_directory_bytes.len,
});
leaf_directories_length += leaf_directory_bytes.len;
}
var root_entries: std.ArrayList(Entry) = .empty;
for (spec[0]) |input_entry| {
const pmtiles_entry: Entry = blk: switch (input_entry) {
.tile => |t| {
const loc = tile_locations.get(t.id) orelse return error.TileLocationNotFound;
break :blk .{
.id = t.id,
.run_length = t.run_length,
.offset = loc.offset,
.length = loc.length,
};
},
.directory => |d| {
const loc = leaf_directory_locations.get(d.min_tile_id) orelse return error.LeafLocationNotFound;
break :blk .{
.id = d.min_tile_id,
.run_length = 0,
.offset = loc.offset,
.length = loc.length,
};
},
};
try root_entries.append(arena, pmtiles_entry);
}
const root_directory_bytes = try encodeDirectory(arena, root_entries.items, .none);
return .{
.root_directory_length = root_directory_bytes.len,
.root_directory_bytes = root_directory_bytes,
.leaf_directories_length = leaf_directories_length,
.leaf_directories_bytes = try leaf_directories_bytes.toOwnedSlice(arena),
.tile_data_length = tile_data_length,
.statistics = .{
.number_of_addressed_tiles = number_of_addressed_tiles,
.number_of_tile_contents = number_of_tile_contents,
.number_of_tile_entries = number_of_tile_entries,
},
};
}
fn encodeDirectory(allocator: std.mem.Allocator, entries: []const Entry, internal_compression: Header.Compression) ![]const u8 {
var directory_allocating: std.io.Writer.Allocating = .init(allocator);
defer directory_allocating.deinit();
const directory_writer = &directory_allocating.writer;
try directory_writer.writeUleb128(@as(u64, entries.len));
var last_id: u64 = 0;
for (entries) |entry| {
try directory_writer.writeUleb128(@as(u64, entry.id - last_id));
last_id = entry.id;
}
for (entries) |entry| {
try directory_writer.writeUleb128(@as(u64, entry.run_length));
}
for (entries) |entry| {
try directory_writer.writeUleb128(@as(u64, entry.length));
}
var next_byte: u64 = 0;
for (0.., entries) |idx, entry| {
if (idx > 0 and entry.offset == next_byte) {
try directory_writer.writeUleb128(@as(u64, 0));
} else {
try directory_writer.writeUleb128(@as(u64, entry.offset + 1));
}
next_byte = entry.offset + entry.length;
}
const encoded_bytes = try directory_allocating.toOwnedSlice();
errdefer allocator.free(encoded_bytes);
return switch (internal_compression) {
.none => encoded_bytes,
else => error.UnsupportedInternalCompression,
};
}
};
};
Setting the constant FILESYSTEM_SPEC to input.ZOOM_PARTITIONED_FILESYSTEM generates a file that is identical to the hardcoded script. I also included a couple other FILESYSTEM_SPEC configurations for experimenting with.
That's it for now. There are a couple ideas I still want to experiment with, so maybe there will be a Part 2. Until next time!