Back Original

Some notes on architecture diagrams

I look at a lot of architecture diagrams. Most of them are not about systems I personally shepherd, and I have no control over what is presented to me. But boy, do I have opinions.

In this blog post, I outline some principles I see as immutable, regardless of system shape, and explain why I hold these beliefs. I explain what views I—and probably you—need. And finally, we talk a bit about how to produce them while acknowledging that this is dependent on which technologies you use.

The point isn’t to generate pretty pictures; in fact, I couldn’t care less about the layout or color scheme (okay, okay, readability would be nice). I want truthfulness, not having to memorize a bunch of verbal addenda and errata when I meet with the architect for the first time.

The cool thing about these diagrams is that we all want the same things from them: no matter if you are an architect, an individual contributor, an auditor, or involved in due diligence, you want to see data flow, external surface area, network segmentation, data warehousing, et al. If we do things right, we can use the same diagrams (or maybe with slight redactions) for onboarding, Architecture Decision Records, threat modelling and other workshops, and DD and audits. Wouldn’t that be nice?

Some principles

Below is a quick list of my principles. If you do most of these, we can be friends (if you don’t, I’ll still be your friend, but I’ll judge you for your life choices).

Now, if we had a tool that made it easy and cheap to create views of your system, this would all be so simple…

Some anti-patterns

Of course, all principles come with their inverse. Here are some things I don’t want to see or do:

The views we both need

No matter if we look at the system from the outside or because we live inside it, we still need the same views. For the purposes of this blog post, I thought of some snappy names, but I suck at naming, so you’ll need to go by their description.

These first three are essentially the C4 model views, and we just add some other useful things on top.

I know that some of these are easier to generate than others, and creating all of these accurately and automatically isn’t always feasible. This is a best-in-class write-up, and for some systems might remain aspirational. Try to get as close to it as possible (and then back off when the asymptotes kick in).

Data sources and what to generate

Here are some things that might serve as data sources for your diagrams. These are so specific to technologies used that I won’t even try to cover all of it. Take these as examples, not as a full list.

Hopefully, you won’t need all of these data sources. But whatever you take, consolidate it into a format of your choice (JSON or YAML are usually good choices) that is then used by renderers to actually produce pictures (or Mermaid/PlantUML/whatever diagrams).

A production pipeline sketch

I sketched an example GitLab pipeline (full disclosure: I used an LLM for a lot of this, so you’ll probably have to either fix most of this or just take it as pseudocode) in this gist. It showcases all the steps and generations.

This might seem like a lot (and it is!), but keep in mind that this is the maximalist approach, and you don’t have to start with the full package. Start with something simple and work your way up. Build an MVP in a day and then start iterating (context and data flow are two good ondes to start with).

Metadata overlays

Some metadata will probably not be captured in these generations. We either inline the metadata about each service in the diagram, or we link to the service documentation and/or source inside the diagram.

For each node we need owners, SLAs, data privacy and authZ information, external dependencies, and so on somewhere. For each edge, we need protocol, transport, auths, rate, etc. Not all of this should be inline, it will crowd the diagram. Link to external sources of information wherever possible.

A quick note on redactions

There will be a point in a company’s lifetime when these diagrams need to be shared externally. Maybe some of them are part of the external documentation. Maybe they’re being handed over to a potential buyer or investor.

Some data will then need to be redacted. The nice thing about a data and model-first approach, though, is that redactions are easy to implement: they are just filters.

And, since you have multiple views of the system, you can additionally control sharing by choosing the view. You can throw out the nitty-gritty like the runtime topology or network segmentation without any additional work.

Fin

In this blog post, we looked at how to build useful architecture diagrams for internal and external use. We talked about principles and anti-patterns, views and how to create and manage them.

I hope you were able to take something useful away from this blog post that will elevate the usefulness of your diagrams. They are the first view into your system, and they are what you’ll see when you close your eyes to think about your architecture.