Back Original

Show HN: I ported Tree-sitter to Go

It’s hard to get optimizers right. Even if you build up a painstaking test suite by hand, you will likely miss corner cases, especially corner cases at the interactions of multiple components or multiple optimization passes.

I wanted to see if I could write a fuzzer to catch some of these bugs automatically. But a fuzzer alone isn’t much use without some correctness oracle—in this case, we want a more interesting bug than accidentally crashing the optimizer. We want to see if the optimizer introduces a correctness bug in the program.

So I set off in the most straightforward way possible, inspired by my hazy memories of a former CF blog post.

Generating programs

Generating random programs isn’t so bad. We have program generation APIs and we can dynamically pick which ones we want to call. I wrote a small loop that generates loads from and stores to the arguments at random offsets and with random values, and escapes to random instructions with outputs. The idea with the escape is to keep track of the values as if there was some other function relying on them.

def generate_program():
    bb = Block()
    args = [bb.getarg(i) for i in range(3)]
    num_ops = random.randint(0, 30)
    ops_with_values = args[:]
    for _ in range(num_ops):
        op = random.choice(["load", "store", "escape"])
        arg = random.choice(args)
        a_value = random.choice(ops_with_values)
        offset = random.randint(0, 4)
        if op == "load":
            v = bb.load(arg, offset)
            ops_with_values.append(v)
        elif op == "store":
            value = random.randint(0, 10)
            bb.store(arg, offset, value)
        elif op == "escape":
            bb.escape(a_value)
        else:
            raise NotImplementedError(f"Unknown operation {op}")
    return bb

This generates random programs. Here is an example stringified random program:

var0 = getarg(0)
var1 = getarg(1)
var2 = getarg(2)
var3 = load(var2, 0)
var4 = load(var0, 1)
var5 = load(var1, 1)
var6 = escape(var0)
var7 = store(var0, 2, 3)
var8 = store(var2, 0, 7)

No idea what would generate something like this, but oh well.

Verifying programs

Then we want to come up with our invariants. I picked the invariant that, under the same preconditions, the heap will look the same after running an optimized program as it would under an un-optimized program1. So we can delete instructions, but if we don’t have a load-bearing store, store the wrong information, or cache stale loads, we will probably catch that.

def verify_program(bb):
    before_no_alias = interpret_program(bb, ["a", "b", "c"])
    a = "a"
    before_alias = interpret_program(bb, [a, a, a])
    optimized = optimize_load_store(bb)
    after_no_alias = interpret_program(optimized, ["a", "b", "c"])
    after_alias = interpret_program(optimized, [a, a, a])
    assert before_no_alias == after_no_alias
    assert before_alias == after_alias

I have a very silly verifier that tests two cases: one where the arguments do not alias and one where they are all the same object. Generating partial aliases would be a good extension here.

Last, we have the interpreter.

Running programs

The interpreter is responsible for keeping track of the heap (as indexed by (object, offset) pairs) as well as the results of the various instructions.

We keep track of the escaped values so we can see results of some instructions even if they do not get written back to the heap. Maybe we should be escapeing all instructions with output instead of only random ones. Who knows.

def interpret_program(bb, args):
    heap = {}
    ssa = {}
    escaped = []
    for op in bb:
        if op.name == "getarg":
            ssa[op] = args[get_num(op, 0)]
        elif op.name == "store":
            obj = ssa[op.arg(0)]
            offset = get_num(op, 1)
            value = get_num(op, 2)
            heap[(obj, offset)] = value
        elif op.name == "load":
            obj = ssa[op.arg(0)]
            offset = get_num(op, 1)
            value = heap.get((obj, offset), "unknown")
            ssa[op] = value
        elif op.name == "escape":
            value = op.arg(0)
            if isinstance(value, Constant):
                escaped.append(value.value)
            else:
                escaped.append(ssa[value])
        else:
            raise NotImplementedError(f"Unknown operation {op.name}")
    heap["escaped"] = escaped
    return heap

Then we return the heap so that the verifier can check.

The harness

Then we run a bunch of random tests through the verifier!

def test_random_programs():
    # Remove random.seed if using in CI... instead print the seed out so you
    # can reproduce crashes if you find them
    random.seed(0)
    num_programs = 100000
    for i in range(num_programs):
        program = generate_program()
        verify_program(program)

The number of programs is configurable. Or you could make this while True. But due to how simple the optimizer is, we will find all the possible bugs pretty quickly.

I initially started writing this post because I thought I had found a bug, but it turns out that I had, with CF’s help, in 2022, walked through every possible case in the “buggy” situation, and the optimizer handles those cases correctly. That explains why the verifier didn’t find that bug!

Testing the verifier

So does it work? If you run it, it’ll hang for a bit and then report no issues. That’s helpful, in a sense… it’s revealing that it is unable to find a certain class of bug in the optimizer.

Let’s comment out the main load-bearing pillar of correctness in the optimizer—removing aliasing writes—and see what happens.

We get a crash nearly instantly:

$ uv run --with pytest pytest loadstore.py -k random
...
=========================================== FAILURES ============================================
_____________________________________ test_random_programs ______________________________________

    def test_random_programs():
        random.seed(0)
        num_programs = 100000
        for i in range(num_programs):
            program = generate_program()
>           verify_program(program)

loadstore.py:617:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

bb = [Operation(getarg, [Constant(0)], None, None), Operation(getarg, [Constant(1)], None, None), Operation(getarg, [Consta...], None, None)], None, None), Operation(load, [Operation(getarg, [Constant(0)], None, None), Constant(0)], None, None)]

    def verify_program(bb):
        before_no_alias = interpret_program(bb, ["a", "b", "c"])
        a = "a"
        before_alias = interpret_program(bb, [a, a, a])
        optimized = optimize_load_store(bb)
        after_no_alias = interpret_program(optimized, ["a", "b", "c"])
        after_alias = interpret_program(optimized, [a, a, a])
        assert before_no_alias == after_no_alias
>       assert before_alias == after_alias
E       AssertionError: assert {('a', 0): 4,...', 3): 1, ...} == {('a', 0): 9,...', 3): 1, ...}
E
E         Omitting 4 identical items, use -vv to show
E         Differing items:
E         {('a', 0): 4} != {('a', 0): 9}
E         Use -v to get more diff

loadstore.py:610: AssertionError
==================================== short test summary info ====================================
FAILED loadstore.py::test_random_programs - AssertionError: assert {('a', 0): 4,...', 3): 1, ...} == {('a', 0): 9,...', 3): 1, ...}
=============================== 1 failed, 15 deselected in 0.04s ================================
$

We should probably use bb_to_str(bb) and bb_to_str(optimized) to print out the un-optimized and optimized traces in the assert failure messages. But we get a nice diff of the heap automatically, which is neat. And it points to an aliasing problem!

Full code

See the full code.

Extensions

Thanks

Thank you to CF Bolz-Tereick for feedback on this post!