Python-BPF offers a new way to write eBPF programs entirely in Python, compiling them into real object files. This project is open-source and available on GitHub and PyPI. I wrote it alongside R41k0u.
Python-BPF is a published Python library with plans for further development towards production-ready use.
Before Python-BPF, writing eBPF programs in Python typically involved embedding C code within multiline strings, often using libraries like bcc
. eBPF allows for small programs to run based on kernel events, similar to kernel modules.
Here’s an example of how it used to be:
from bcc import BPF
from bcc.utils import printb
# define BPF program
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
}
"""
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="hello")
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
except KeyboardInterrupt:
exit()
printb(b"%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
This approach, while functional, meant writing C code within Python, lacking support from modern Python development tools like linters.
# load BPF program
b = BPF(text="""
#include <uapi/linux/ptrace.h>
BPF_HASH(last);
int do_trace(struct pt_regs *ctx) {
u64 ts, *tsp, delta, key = 0;
// attempt to read stored timestamp
tsp = last.lookup(&key);
if (tsp != NULL) {
delta = bpf_ktime_get_ns() - *tsp;
if (delta < 1000000000) {
// output if time is less than 1 second
bpf_trace_printk("%d\\n", delta / 1000000);
}
last.delete(&key);
}
// update stored timestamp
ts = bpf_ktime_get_ns();
last.update(&key, &ts);
return 0;
}
""")
The multiline C program approach allowed for features like BPF MAPS (hashmap type), map lookup, update, and delete, BPF helper functions (e.g., bpf_ktime_get_ns
, bpf_printk
), control flow, assignment, binary operations, sections, and tracepoints.
For production environments, eBPF programs are typically written in pure C, compiled by clang
into a bpf target object file, and loaded into the kernel with tools like libbpf
. This approach features map sections, license global variables, and section macros specifying tracepoints.
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#define u64 unsigned long long
#define u32 unsigned int
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1);
__type(key, u32);
__type(value, u64);
} last SEC(".maps");
SEC("tracepoint/syscalls/sys_enter_execve")
int hello(struct pt_regs *ctx) {
bpf_printk("Hello, World!\\n");
return 0;
}
char LICENSE[] SEC("license") = "GPL";
Python-BPF brings the true eBPF experience to Python by allowing the exact same functionality to be replaced by valid Python code. This is a significant improvement over multiline C strings, offering support from existing Python tools.
from pythonbpf import bpf, map, section, bpfglobal, compile
from ctypes import c_void_p, c_int64, c_int32, c_uint64
from pythonbpf.helpers import ktime
from pythonbpf.maps import HashMap
@bpf
@map
def last() -> HashMap:
return HashMap(key_type=c_uint64, value_type=c_uint64, max_entries=1)
@bpf
@section("tracepoint/syscalls/sys_enter_execve")
def hello(ctx: c_void_p) -> c_int32:
print("entered")
return c_int32(0)
@bpf
@section("tracepoint/syscalls/sys_exit_execve")
def hello_again(ctx: c_void_p) -> c_int64:
print("exited")
key = 0
last().update(key)
ts = ktime()
return c_int64(0)
@bpf
@bpfglobal
def LICENSE() -> str:
return "GPL"
compile()
Python-BPF uses ctypes
to preserve compatibility, employs decorators to separate the BPF program from other Python code, allows intuitive creation of global variables, and defines sections and tracepoints similar to its C counterpart. It also provides an interface to compile and run in the same file.
Step 1: Generate AST
The Python ast
module is used to generate the Abstract Syntax Tree (AST).
Step 2: Emit LLVM IR
llvmlite
from Numba emits LLVM Intermediate Representation (IR) and debug information for specific parts like BPF MAPs. The .py
file is converted into LLVM Intermediate Representation.
Step 3: Compile LLVM IR
The .ll
file, containing all code written under the @bpf
decorator, is compiled using llc -march=bpf -O2
.
Step 4: Generate eBPF Object File The LLVM backend takes the IR and converts it into a bpf object file with eBPF bytecode, handling all optimizations.
Previous Python options for eBPF relied on bcc
for compilation, which is not ideal for production use. The only two real options for production-quality eBPF programs were aya
in Rust and Clang with kernel headers in C. Python-BPF introduces a third, new option, expanding the horizons for eBPF development.
It currently supports:
Thanks for reading my poorly written blog :)