James Routley

About ten years ago, I started thinking in earnest about how we could make it easier to write correct programs. Researching this question led me to topics like formal methods and type systems, techniques to help establish that a given program adheres to some rules. However, I was still unsure of how to prove that software was actually correct. Not in the sense that the executed instructions produce a result consistent with the specification, but in the sense that this program actually does what the people involved want it to do.

Unfortunately, this leads to an easy description of a seemingly impossible task: software is correct when everyone involved knows what the program should do and can confirm that the program only does what it should. In one sense, this is impossible. There's the program itself, but then there's the Program. The capital-P Program is the concept that lives in our heads. Agreeing that a program is correct is, in one sense, agreeing that we're all thinking the same thought (I guess I'm assuming this isn't possible; the philosphical feasibility of this is out of scope of this post). This leads to a more pedantic version of my previous statement: a program is correct when everyone involved agrees that the real-world program in effect is an adequate representation of the Program in their heads and the real-world program is doing only what it should. So, how do we know that the program as it exists is the Program we imagined?

The concept that I think has helped me the most when thinking about this problem is the semantic gap. The semantic gap, to me, highlights what we lose in the tradeoff of formalizing ideas into code. Some people can read the code, see that it reflects the Program in their heads, and say "lgtm". I think that's the ideal pattern of using code to decrease the semantic gap between people and our programs. But some people will read the same code and realize they have no idea how it connects to the Program in their heads. Having exhausted our tooling for reducing the semantic gap, and under pressure to keep things moving, what are we to do but solemnly type "lgtm"? It's clear that reading code cannot be the only method of communicating the intent of the ideas behind it. At the same time, though, I think the executable code has a compelling case to be the source of truth. But if code is an ineffective means of communication even among programmers, how do we help people understand programs? This, I think, is where program analysis comes in.

The way I think about program analysis is that it's a way for me to look at a program I wrote, ask "what have I done", and get a meaningful result. There are plenty of ways to do this, and while "running the code" is the most popular the branch of analysis I'm interested in is called static analysis. It's interesting to me because, ideally, we could take a set of specified components and ask questions about what the whole system is capable of but without using the resources involved in running the code. Does this program ever try to access specific data? Is there a way to get to this web page without logging in? To put it another way, there are decisions that need to be made within each program and we'd like confirmation that those decisions are being made carefully.

But decisions aren't always made in isolation and aren't always made by people who can read the code to verify it does the right thing. Being able to inspect systems — providing consistent and accurate answers to the questions people have about the programs that play a role in our everyday lives — is a necessity to ever have correct software. Even as the author of a program, we are only one blind person touching one part of the elephant. We need the perspective and understanding of others to confirm we've done the right thing.

My practitioner view of program analysis

My practitioner view of program analysis