An Abstract Syntax Tree (AST) is a representation of the source code structure as data. It is called 'abstract' because it removes surface details like comments, formatting or extra parentheses.
It can be thought of as a nested dictionary or JSON-like document that describes the code.
Representing code as data allows writing all sorts of software: Abstract Syntax Trees are used in interpreters, compilers (including things like babel in JavaScript), linters (flake8/black/ruff in Python, eslint in JavaScript) and many more tools.
These programs can be surprisingly approachable, at least conceptually.
defevaluate(node):
"""An ast interpreter for basic arithmetic expressions."""if node["type"] == "Add":
return evaluate(node["left"]) + evaluate(node["right"])
if node["type"] == "Subtract":
return evaluate(node["left"]) - evaluate(node["right"])
if node["type"] == "Number":
return node["value"]
raise ValueError(f"unsupported node type {node['type']}")
While a linter might do things like:
deflint(node):
if node["type"] == "FunctionDefinition"andlen(node["args"]) > 5:
report_error(node.location, "too many parameters in function")
This direct access to the AST means we can modify the tree before calling compile, changing the program before it runs.
These modifications are known as AST transforms.
AST transforms get the source code to behave like different source code.
A fun example is what pytest does to display nice error messages when an assert fails.
Python default behavior on AssertionError is to print:
Traceback (most recent call last):
File "/home/laurent/test/assert.py", line 2, in <module>
assert a == b
AssertionError
Want to know the values of a and b? Add print statements, run the code again.
> assert a == b
E assert 1 == 2
assert.py:3: AssertionError
It is a little magical: we did not change the code, we just used pytest test_module.py instead of python test_module.py.
It is still Python executing the code, only when running with pytest, instead of:
assert a == b
Python executes something like:
try:
assert a == b
except AssertionError:
[pytest-generated code to display a nice error message]
raise
Pytest does that without modifying the source code (your files on disk), by transforming the AST before Python gets to execute it2.
An AST transform takes a tree as input and modifies it. ast.NodeTransformer is a helper class that traverses the tree for us.
An AST is made of many different node types (ast.Assign, ast.Call, ast.Name and many more).
NodeTransformer walks the tree and calls the visit_<node_type> methods when they exist.
The example from the official documentation: replace all variables (name lookups) with data["variable_name"]
try:
assert a == b
except AssertionError:
raise AssertionError(f"a == b failed\na = {a}\nb = {b}")
This achieves a behavior similar to pytest: it improves on Python's assert by showing us the values of variables in the error message.
General advice to write such a transformer:
We don't need to know all the node types ahead of time. It's easy to pick them up as needed.
It helps a lot to visualize the source and
target ASTs, with a web tool or ast.dump.
ast.unparse can be used to convert an AST to source code: useful to test the transformer.
Some moderately ugly visualizations of the source and target ASTs:
Assert
test
source Assert node
Try
body[0]
handlers[0]
body[0]
exc
args[0]
JoinedStr subtree (collapsed)
target node we want to replace the Assert node with
Here, looking at the code and their trees:
We want to replace each Assert node with a Try node, that contains the original Assert node and has an exception handler with a custom Raise node.
We'll need to construct a few new AST nodes like JoinedStr, Constant or FormattedValue.
We'll want the a == b part of assert a == b as a string literal, to include in our custom error message.
The assert node exposes this expression via the test attribute, which we can turn back into a code string with ast.unparse.
Along the way we will need to collect the variables used in the Assert node (a and b in our example) so they can be included in the error message.
Variables are represented as Name nodes, where the id attribute is the variable name.
This program transforms the AST for assert a == b and prints code for the transformed AST. The result matches our target source code.
This could be adapted to take a file as input, compile the modified tree and exec it. This would give an interface similar to pytest: test_runner.py test_module.py.
This example is intentionally minimal to remain approachable.
If this still feels a bit overwhelming, spending time looking at the ASTs in the explorer tool should help.
An interesting exercise could be to support attribute access (assert a.b == c.d) and indexing into lists (assert a[b] == c[d]).
The real-world pytest transform does a lot more, like showing intermediate values of computations/function calls.
Strictly speaking, pytest does not generate Python code. As we have seen, it transforms the AST instead.
But we can still use ast.unparse to get source code from the transformed AST.
Here's what it looks like for assert a == b (using pytest 7.4.3 with python 3.11.23):
importbuiltinsas @py_builtinsimport_pytest.assertion.rewriteaspytest_ar
@py_assert1 = a == bifnot @py_assert1:
@py_format3 = pytest_ar._call_reprcompare(('==',), (@py_assert1,), ('%(py0)s == %(py2)s',), (a, b)) % {
'py0': pytest_ar._saferepr(a) if 'a' in @py_builtins.locals() orpytest_ar._should_repr_global_name(a) else 'a',
'py2': pytest_ar._saferepr(b) if 'b' in @py_builtins.locals() orpytest_ar._should_repr_global_name(b) else 'b'}
@py_format5 = ('' + 'assert %(py4)s') % {'py4': @py_format3}
raiseAssertionError(@pytest_ar._format_explanation(@py_format5))
@py_assert1 = None
Manipulating code as data can make for some fun and powerful tools. Like most metaprogramming techniques, it should be used with great responsibility.
Even if you rarely use the ast module directly in application code, understanding Abstract Syntax Trees makes many of the developer tools we use less mysterious.