James Routley

What exactly makes software local-first? I like these three bullet points from Martin Kleppmann’s talk The Past, Present and Future of Local-First The past, present, and future of local-first Slides from a talk given at the Local-First Conference, Berlin, Germany, on 30 May 2024. https://martin.kleppmann.com/2024/05/30/local-first-conference… speakerdeck.com/ept/the-past-present-and-future-of-local-first?slide=15 :

If it’s local-only, it’s not local-first.

If it doesn’t work with the wi-fi off, it’s not local-first.

If it doesn’t work if the app developer goes out of business and shuts down their servers, it’s not local-first.

Most local-first software focuses almost exclusively on the first two. I’m certainly guilty of it. In my article about building a local-first travel app A Local-First Case Study | jakelazaroff.com How I built a local-first app for planning trips, and what I learned about the current state of the local-first ecosystem along the way. jakelazaroff.com/words/a-local-first-case-study/ , I claimed that the open source sync engine I used wasn’t a critical dependency because you could run your own version if mine went down.

But then I chose a managed service for the sync engine to avoid a big Cloudflare bill. Then the company that ran the service got acquired, and then the service got shut down. My app no longer works; I’d have to put in time and money to bring it back online. You could clone the repo and run it yourself, but let’s be real: no one’s gonna do that.

What if there was a better way?

I’ve been cooking up an idea that I’m pretty excited about. To demo it, I’ve built a simple collaborative text editor.¹

The catch? There’s no sync server. No Zero, no Electric, no Y-Sweet — nada. It’s a real-time collaborative app built purely with static files.

Obviously, that’s not the whole story — there’s a server somewhere syncing stuff! The trick is that the server is atproto AT Protocol atproto.com .

Most people who’ve heard of atproto know it as protocol that Bluesky Bluesky Social media as it should be. Find your community among millions of users, unleash your creativity, and have some fun again. bsky.social uses. But it actually powers a whole constellation of apps: Tangled tangled · tightly-knit social coding The next-generation social coding platform. tangled.org for version control, Leaflet Leaflet Read and publish on the Atmosphere leaflet.pub and Pckt pckt.blog A distraction-free space to write and share your story. Just you, and your words. pckt.blog for blogging, Wisp wisp.place - Static Hosting on AT Protocol Deploy static sites on AT Protocol. Your site lives in your PDS. Global CDN included. Push updates instantly. Forever free. wisp.place for web hosting and many more.

Am I not just moving the goalposts here? There are a zillion services that will add real-time sync to your static app. What makes atproto different from Liveblocks or Instant or Convex?

Ink & Switch addresses this in their PushPin case study PushPin: Towards Production-Quality Peer-to-Peer Collaboration Taking peer-to-peer beyond research prototypes, and working towards commercial-grade P2P collaboration software. www.inkandswitch.com/pushpin/#minimal-dependence-on-servers :

Where servers are used, we want them to be as simple, generic, and fungible as possible, so that one unavailable server can easily be replaced by another. Further, these servers should ideally not be centralised: any user or organisation should be able to provide servers to serve their needs.

Proprietary sync services don’t fit the bill. Replacing one with another can be very difficult!

But my initial interpretation of “simple, generic and fungible” was too narrow. The ability to replace an unavailable server is a low bar, because it places the onus on users to actually do so.

My thesis is that atproto fulfills this plea far better — by relying on a federation of servers maintained by a large and diverse community that already powers a network of applications. If it lives up to its wildest ambitions, depending on atproto will be like depending on DNS or HTTP: public infrastructure so ubiquitous that the risks of lock-in and obsolescence all but disappear.

You can see a working demo of the app here yjs via pds jake.tngl.io/y-pds/ . To try it out, sign in with your Internet handle Internet Handle internethandle.org/ and send the URL to someone.

Architecture

So: no sync server, only atproto. How can we build sync on top of a protocol that doesn’t know about it?

We use a CRDT! CRDTs (Conflict-free Replicated Data Types) are data structures that let us merge changes from multiple sources without a centralized authority to resolve conflicts. Crucially, they ensure that everyone with the same set of changes ends up with exactly the same result.

You can think of a CRDT as a sort of network-agnostic data layer. It doesn’t matter how the changes get from one client to another, or how long it takes. As long as everyone can eventually find each other and exchange their changes, they’ll all see the same document.

That’s where atproto comes in. Each user is uniquely identified by a DID (Decentralized Identifier). That DID allows other clients to find the user’s PDS (Personal Data Server), which is where all their data lives. A PDS is a bit like a Git repository — except instead of arbitrary files, apps commit structured JSON that any other atproto app can read.

That should be enough background to understand how this all fits together! If you’re interested in learning more about CRDTs, I wrote an interactive intro An Interactive Intro to CRDTs | jakelazaroff.com CRDTs don't have to be all academic papers and math jargon. Learn what CRDTs are and how they work through interactive visualizations and code samples. jakelazaroff.com/words/an-interactive-intro-to-crdts/ that shows how they work in depth. If you’re interested in learning more about atproto, Dan Abramov has a great post called A Social Filesystem A Social Filesystem — overreacted Formats over apps. overreacted.io/a-social-filesystem/ that breaks down how data flows through it.

For this proof of concept, I used a CRDT library called Yjs Yjs Shared Editing yjs.dev . I chose it because it’s popular and good, but another CRDT library like Automerge Automerge CRDT | Automerge CRDT Automerge is a library of data structures for building collaborative applications. automerge.org or Loro Loro – Reimagine state management with CRDTs – Loro Loro - Reimagine state management with CRDTs | Built for local-first software. loro.dev would work as well.

Let’s go over how this all works. We’ll skip details like authentication and text editing so we can focus on what actually makes this thing tick: the interaction between the CRDT and atproto.

Persistence

Before we can start collaborating with other people, we need to get the Yjs data into our PDS. To do that, we’ll build a Yjs provider, which lets us sync our document with an external source.

Here’s the start of our provider class:

class YPdsProvider {
  #ydoc;
  #clients = new Map();

  async #getClient(did) {
    let client = this.#clients.get(did);
    if (client) return client;

    
    const url = did.startsWith("did:web:")
        ? `https://${did.slice("did:web:".length)}/.well-known/did.json`
        : `https://plc.directory/${did}`,
      res = await fetch(url),
      doc = await res.json();

    const pds = doc.service?.find((s) => s.type === "atprotoPersonalDataServer")?.serviceEndpoint;
    if (!pds) throw new Error(`no PDS found for ${did}`);

    client = new Client({ handler: simpleFetchHandler({ service: pds }) });
    this.#clients.set(did, client);
    return client;
  }

  async #getRecord({ repo, collection, rkey }) {
    const client = await this.#getClient(repo);
    const res = await client.get("com.atproto.repo.getRecord", {
      params: { repo, collection, rkey },
    });
    return res.data;
  }

  async #listRecords({ repo, collection, cursor }) {
    const client = await this.#getClient(repo);
    const res = await client.get("com.atproto.repo.listRecords", {
      params: { repo, collection, limit: 100, cursor },
    });
    return res.data;
  }

  async #upsertRecord({ repo, collection, rkey, record }) {
    const client = await this.#getClient(repo);
    const verb = rkey ? "com.atproto.repo.putRecord" : "com.atproto.repo.createRecord";
    const res = await client.post(verb, {
      input: { repo, collection, rkey, record: { $type: collection, ...record } },
    });
    return res.data;
  }

  
}

I’ve included some convenience methods that use the atcute atcute a collection of lightweight TypeScript packages for AT Protocol, the protocol powering Bluesky. codeberg.org/mary-ext/atcute library to make authenticated HTTP requests to a user’s PDS. The finer details are beyond the scope of this article; you can read a more in-depth overview of how exactly we find each user’s PDS in Dan Abramov’s Where It’s at:// Where It's at:// — overreacted From handles to hosting. overreacted.io/where-its-at/ .

Whenever a document changes, Yjs emits an update event:

class YPdsProvider {
  

  constructor({ ydoc, client, atUri, did }) {
    this.#clients.set(did, client);
    this.#ydoc = ydoc;
    this.did = did;

    const [ownerDid, , rkey] = atUri.slice("at://".length).split("/");
    this.#ownerDid = ownerDid;
    this.#rkey = rkey;

    this.#ydoc.on("update", this.#onUpdate);
  }
}

Each event contains a binary description of the change in that update. We’ll encode it as base 64 and store it in an updates collection in our PDS, along with a unique document ID so we can disambiguate between updates for other documents:

const encode = (update) => btoa(String.fromCharCode(...update));

class YPdsProvider {
  

  #onUpdate = async (update, origin) => {
    if (origin === this) return;
    await this.#upsertRecord({
      repo: this.did,
      collection: UPDATE_COLLECTION,
      record: { docId: this.#rkey, update: encode(update), createdAt: new Date().toISOString() },
    });
  };
}

We’re now saving document changes to our PDS!

Next, we need to load our document when the page loads. To do that, we can iterate through all the records in our updates collection, omit the updates with other document IDs and apply all the remaining ones to our document:

const decode = (b64) => Uint8Array.from(atob(b64), (c) => c.charCodeAt(0));

class YPdsProvider {
  

  async load() {
    const updates = await this.#fetchUpdates(this.did);
    Y.transact(this.#ydoc, () => {
      for (const { value } of updates) {
        Y.applyUpdate(this.#ydoc, decode(value.update));
      }
    });
  }

  async #fetchUpdates(repo) {
    const records = [];
    let cursor;
    do {
      const res = await this.#listRecords({ repo, collection: UPDATE_COLLECTION, cursor });
      records.push(...res.records.filter((r) => r.value.docId === this.#rkey));
      cursor = res.cursor;
    } while (cursor);
    return records;
  }
}

That’s it! In a few lines of code, we’re using our atproto PDS as a persistence layer for Yjs documents.²

There is one optimization we should make. Most users are on a Bluesky-hosted PDS, which means they’re subject to Bluesky’s rate limits Rate Limits | Bluesky Rate limits help service providers keep the network secure. For example, by limiting the number of requests a user or bot can make in a given time period, it prevents bad actors from brute-forcing certain requests and helps us limit spammy behavior. docs.bsky.app/docs/advanced-guides/rate-limits . Bluesky allows users to create 1,666 records in an hour, and if we’re not careful we might blow past that.

To prevent that from happening, we’ll buffer and send our Yjs updates in five second intervals:

class YPdsProvider {
  

  #pendingUpdates = [];
  #updateTimerId = null;

  
  #flushUpdate = async () => {
    this.#updateTimerId = null;
    if (this.#pendingUpdates.length === 0) return;

    
    const merged =
      this.#pendingUpdates.length === 1
        ? this.#pendingUpdates[0]
        : Y.mergeUpdates(this.#pendingUpdates);
    this.#pendingUpdates = [];

    await this.#upsertRecord({
      repo: this.did,
      collection: UPDATE_COLLECTION,
      record: { docId: this.#rkey, update: encode(merged), createdAt: new Date().toISOString() },
    });
  };

  #onUpdate = (update, origin) => {
    if (origin === this) return;

    
    this.#pendingUpdates.push(update);

    
    if (this.#updateTimerId == null) {
      this.#updateTimerId = setTimeout(this.#flushUpdate, 5000);
    }
  };
}

Collaboration

Working on a document in isolation is no fun. We want to collaborate with other people!

The main challenge here is discovery: how do we find updates from other people working on the same document?

We’ll use the document ID as the “rkey” (record key) of a new record in a docs collection. Records can be fetched directly using their rkey, which means other clients who know both our DID and document ID can get this record without iterating through the entire collection:

class YPdsProvider {
  

  async load() {
    try {
      const data = await this.#getRecord({
        repo: this.#ownerDid,
        collection: DOC_COLLECTION,
        rkey: this.#rkey,
      });
      this.#doc = data.value;
    } catch {
      
      if (this.did !== this.#ownerDid) throw new Error("Document not found");

      this.#doc = { docId: this.#rkey, editors: [], createdAt: new Date().toISOString() };
      await this.#upsertRecord({
        repo: this.did,
        collection: DOC_COLLECTION,
        rkey: this.#rkey,
        record: this.#doc,
      });
    }

    
  }
}

After creating the record, the document owner sends the full atproto URI to their collaborators out of band somehow (via a Bluesky DM, for instance).

What’s that editors array? The document creator adds the DID of every other user they want to be able to edit the document. Each of those users adds any changes they make to the updates collection in their own repo.

When a user loads a document, they first get the record from the docs collection at the known atproto URI. Then, in addition to fetching their own updates, they also fetch the updates from the other editors:

async load() {
  

  const repos = [this.#ownerDid].concat(this.#doc.editors);
  const updates = await Promise.all(repos.map((repo) => this.#fetchUpdates(repo)));

  Y.transact(this.#ydoc, () => {
    for (const { value } of updates.flat()) {
      Y.applyUpdate(this.#ydoc, decode(value.update));
    }
  });
}

What, were you expecting more code? Editor updates are exactly the same as “owner” updates. Because Yjs is a CRDT, we’re guaranteed to have the most up-to-date version of the document once all the updates are applied.³

Realtime Editing

Of course, it’s not a good experience to incorporate other editors’ changes only on page load. We want to see things happening in real time.

atproto can push updates to clients in a few ways. The one we’ll use is called the Jetstream Introducing Jetstream | Bluesky One of most popular aspects of atproto for developers is the firehose: an aggregated stream of all the public data updates in the network. Independent developers have used the firehose to build real-time monitoring tools (like Firesky), feed generators, labeling services, bots, entire applications, and more. docs.bsky.app/blog/jetstream : a WebSocket connection that streams events from select repositories and collections as they happen.⁴

After loading the editors list, we connect to the Jetstream and subscribe to those DIDs (plus the owner’s) and listen for new records in the ‌doc and updates collections:

class YPdsProvider {
  

  #subscribe() {
    const url = new URL(this.#jetstream);
    url.searchParams.append("wantedCollections", UPDATE_COLLECTION);
    url.searchParams.append("wantedCollections", DOC_COLLECTION);
    url.searchParams.append("wantedDids", this.#ownerDid);
    for (const editor of this.#doc.editors) url.searchParams.append("wantedDids", editor);

    this.#ws = new WebSocket(url);
    this.#ws.onmessage = async (e) => {
      const event = JSON.parse(e.data);
      this.#cursor = event.time_us;
      if (event.kind !== "commit") return;

      switch (event.commit.collection) {
        
      }
    };
  }
}

Unfortunately, the Jetstream strips away the signing data that verifies that the record actually came from a given user. That means the Jetstream could forge updates!

To prevent this, whenever we get anything from the Jetstream, we’ll also fetch it from the repo and compare the CIDs (content IDs), which are hashes of the content:

class YPdsProvider {
  

  #subscribe() {
    
    this.#ws.onmessage = async (e) => {
      

      const data = await this.#getRecord({
        repo: event.did,
        collection: event.commit.collection,
        rkey: event.commit.rkey,
      });
      if (data.cid !== event.commit.cid) return;
    };
  }
}

If the CIDs match, we’re good to go. For the updates collection, that means applying the updates to our document:

class YPdsProvider {
  

  #subscribe() {
    
    this.#ws.onmessage = async (e) => {
      
      switch (event.commit.collection) {
        case UPDATE_COLLECTION: {
          if (event.commit.record.docId !== this.#rkey) return;
          if (event.did !== this.#ownerDid && !this.#doc.editors.includes(event.did)) return;
          if (event.commit.operation !== "create") return;

          
          

          Y.applyUpdate(this.#ydoc, decode(event.commit.record.update), this);
          break;
        }
      }
    };
  }
}

For the docs collection, an event means the list of editors may have changed. The easiest way to handle that is to simply disconnect from the Jetstream and reload the document with the new list of editors:⁵

class YPdsProvider {
  

  #subscribe() {
    
    this.#ws.onmessage = async (e) => {
      
      switch (event.commit.collection) {
        
        case DOC_COLLECTION: {
          if (event.did !== this.#ownerDid || event.commit.rkey !== this.#rkey) return;
          await this.load();
          break;
        }
      }
    };
  }
}

It’s possible that we could miss an event while disconnected from the Jetstream — not only during this process, but when e.g. closing a laptop. To solve this, we keep track of the most recent record we received and pass it as a cursor when connecting to the Jetstream:

class YPdsProvider {
  

  #subscribe() {
    
    if (this.#cursor) url.searchParams.set("cursor", this.#cursor);

    this.#ws = new WebSocket(url);
    this.#ws.onmessage = async (e) => {
      const event = JSON.parse(e.data);
      this.#cursor = event.time_us;
      
    };
  }
}

Awareness

Let’s add one last finishing touch. You’re probably familiar with this from Google Docs: live cursor positions as other editors navigate the document.

Yjs calls this type of feature awareness Awareness & Presence | Yjs Docs Propagating awareness information such as presence & cursor locations. docs.yjs.dev/getting-started/adding-awareness . It still uses a CRDT to store this ephemeral state — but outside of the document, with no edit history.

Because the constraints are so different, we’ll use a dedicated awareness collection to store this state. Rather than create a new record with every change, we can create a single record using the document ID and overwrite it when our awareness state changes. We’ll also want to throttle this to prevent our users from hitting rate limits:

class YPdsProvider {
  

  #awarenessDirty = false;
  #awarenessTimerId = null;

  
  #flushAwareness = async () => {
    this.#awarenessTimerId = null;
    if (!this.#awarenessDirty) return;
    this.#awarenessDirty = false;

    const update = encodeAwarenessUpdate(this.awareness, [this.awareness.clientID]);
    await this.#upsertRecord({
      repo: this.did,
      collection: AWARENESS_COLLECTION,
      rkey: this.#rkey,
      record: { docId: this.#rkey, awareness: encode(update), createdAt: new Date().toISOString() },
    });
  };

  #onAwarenessChange = ({ added, updated }) => {
    const ours = this.awareness.clientID;
    if (!added.includes(ours) && !updated.includes(ours)) return;

    
    this.#awarenessDirty = true;

    
    if (this.#awarenessTimerId == null) {
      this.#awarenessTimerId = setTimeout(this.#flushAwareness, 5000);
    }
  };
}

As with updates, when loading a document, we merge the awareness records from the other editors:

class YPdsProvider {
  

  async load() {
    

    for (const editor of this.#doc.editors) {
      try {
        const data = await this.#getRecord({
          repo: editor,
          collection: AWARENESS_COLLECTION,
          rkey: this.#rkey,
        });
        applyAwarenessUpdate(this.awareness, decode(data.value.awareness), "remote");
      } catch {
        
      }
    }
  }
}

We also listen for updates to the awareness collection in the Jetstream to get live changes:

class YPdsProvider {
  

  #subscribe() {
    
    this.#ws.onmessage = async (e) => {
      
      switch (event.commit.collection) {
        
        case AWARENESS_COLLECTION: {
          if (event.commit.record.docId !== this.#rkey) return;
          if (event.did !== this.#ownerDid && !this.#doc.editors.includes(event.did)) return;
          if (event.did === this.did) return;

          applyAwarenessUpdate(this.awareness, decode(event.commit.record.awareness), "remote");
          break;
        }
      }
    };
  }
}

And there we have it! A real-time collaborative text editor with no dedicated server, using atproto to sync our changes.

Caveats

Alas: nothing in this life is perfect.

The lack of signed data in the Jetstream isn’t ideal. Comparing the CID to the actual PDS record makes it much harder for a malicious Jetstream relay to spoof updates, but it would be nice to actually cryptographically verify the updates are legit.

Also, all the data is public. We could try to encrypt it somehow, but it still makes me uneasy to have private data floating out there in my PDS — even if it’s ostensibly protected. The proper solution there is probably to wait for atproto to support private and shared private data.⁶

I don’t love that there’s an initial document owner who determines which other users can edit it. I guess that’s the Google Docs–type UX that people expect, but I see it as a limitation. CRDTs can be merged without a centralized authority, so in theory a document doesn’t need an “owner” — every copy can be its own canonical version, as though you’d emailed someone a Photoshop document or Excel spreadsheet.

Relatedly, there’s no good way to have publicly editable documents. I wanted this blog post to have an embedded demo that everyone could edit, but the document owner needs to specifically add each person to the editors list.

Finally, it’s a bit choppy! The Jetstream latency seems to be just too high to make remote keystrokes appear “live” (which makes sense, since we’re filtering down a firehose of all atproto network activity). I have some ideas for how to dramatically improve the performance, though — watch this space!

Parting Thoughts

Reflecting on the atproto conference in Vancouver this past weekend, Chad Fowler posted on Bluesky Chad Fowler (@chadfowler.com) Observation from #ATmosphereConf: The AT folks and localfirst folks (not to mention the "good parts" of web3) are attacking the same problems from different angles. There is not NEARLY enough overlap in these communities. I'm glad to have seen @pvh.ca and others today. More please! bsky.app/profile/chadfowler.com/post/3mi67e2bgy222 :

Observation from #ATmosphereConf: The AT folks and localfirst folks (not to mention the “good parts” of web3) are attacking the same problems from different angles. There is not NEARLY enough overlap in these communities.

I totally agree. Both communities have a lot of shared goals: user agency, data ownership, interoperability, files over apps File over app If you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read. Use tools that give you... stephango.com/file-over-app . My hope in publishing this is to inspire more people to play around where the two overlap.

By The Way

When I was working on this proof of concept, I stumbled on a package called y-atproto y-atproto - npmx A Yjs CRDT provider that syncs documents over the AT Protocol. Documents are stored as ATProto records and real-time updates are delivered via Jetstream. npmx.dev/package/y-atproto that works very similarly to my implementation. npm install it and give it a whirl!

Building More Resilient Local-First Software with atproto