Back Original

Parallel Perl – Autoparallelizing interpreter with JIT

I have been doing AI with Perl for a few decades.

main arc    deep dive

Prologue Since last time...

GPW 2016 · Nuremberg · "KI: Wie testet man ein Hirn?"

That was 10 years ago. A lot happened.

"Patience you must have, my young Padawan"

This prologue is not about Perl — but it is necessary context.

The classified stuff

Some things are better left unspoken.

What I can show you today is roughly half of what happened.

→ moving on

PV & Energy Systems

Designed, built, automated

Victron ecosystem  ·  Perl monitoring & control

PV & Energy Systems

Reference installation S1 Nuremberg

Reference S1 · Nuremberg

~24 kWp · 45 kWh · 80+% off-grid · feed-in

Reference installation J2 Prague

Reference J2 · Prague

40 kWp · 120 kWh · 100% off-grid

Villa-A

A technocrat's house — 2050s standard

Fiber-reinforced concrete Geothermal (Erdwärmekörbe) Capillary ceiling mats Full home automation Off-grid energy 40kWp solar 120kWh battery

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Construction

Villa-A — Systems

  • Erdwärmekörbe, WP — heizen, kühlen (passiv)
  • "Klimamatten" — Deckenheizung, Kühlung (BEKA → Berlin)
  • Luftbrunnen, KWL
  • Privatkraftwerk (Offgrid)
  • Wasserwerk / Wasseraufbereitung
  • Digitale Lichtsteuerung, DALI "Full-Stack DC" (DALI-SELV)
  • ... und haste nicht gesehen

...and it all needs to be controlled.

A house this complex needs an automation system

02 WHIP

Witty House Infrastructure Processor

PV — Perl Integration

First tools: Victron Modbus + ECS BMS — all in Perl

$ ecs_bms_tool -range 1-16          # query all battery modules
$ ecs_bms_tool -get cell_voltage -get cell_temperature
$ ecs_bms_tool -otype json           # JSON for pipeline integration

$ Wmodbus discover 192.168.2.0/24    # find Modbus devices on network
$ Wmodbus --host 192.168.2.201 --unit 2 read holding 0-10
$ Wmodbus --host 192.168.2.201 --profile vents-dbe900l monitor

ecs_bms_tool — ECS LiPro BMS management (SoC, cell voltage, balancing)

Scripts work. But a house is more than solar panels.

Looking for a Smarthome

Preferably in Perl, obviously.

FHEM

FHEM — Perl. Active. Tried it. Suffered.

Home Assistant

Home Assistant — Python. Popular. Not Perl. Not 21st century enough.

WHIP

WHIP "I'll build my own."

The FHEM Experience

"We do not like CPAN" — dependencies create problems. So we reimplement everything ourselves. But worse.
"We do not like PBP" — contributions are done by amateurs. Too high expectations would kill contribution.
"Efficient algorithms are overrated" — "So what? That's 0.1s faster?"
"Tests? TDD? That's superfluous work!"
"I don't like you, you cannot use my GPL code"

QR FHEM SVN svn.fhem.de/trac/browser/trunk/fhem

(FHEM people: no offense. Well, maybe a little.)

MySensors

Nice idea. Wrong execution.

Open-source, DIY, community-driven

Tree topology with auto-routing & self-healing

Up to 254 nodes × 254 sensors — decent scale

Arduino — ATmega328? For a house? In 2020?

Arduino software model — Endless loop with stuff in it.

RS485 / nRF24L01+ — Master-Slave Architecture

Text protocol — semicolon-delimited ASCII over serial. In 2020.

No autonomous operation — nodes depend on gateway/controller

MySensors DIY node

I felt there had to be something better.

Birth of a Node

CAN bus instead of RS485

Multi-master · Inherent collision resolution · Resilient

1 MBit instead of EIB/KNX 9600 baud

Good for 20–30m runs. Plenty for a house.

STM32F103 instead of ATmega328

72 MHz ARM Cortex-M3 · 7× faster than Arduino

FreeRTOS + libopencm3 instead of endless loops

Real tasks · Priorities · Preemption · Hardware abstraction

STM32 Black Pill

RobotDyn Black Pill · STM32F103C8T6

Birth of a Node

CAN bus instead of RS485

Multi-master · Inherent collision resolution · Resilient

1 MBit instead of EIB/KNX 9600 baud

Good for 20–30m runs. Plenty for a house.

STM32F103 instead of ATmega328

72 MHz ARM Cortex-M3 · 7× faster than Arduino

FreeRTOS + libopencm3 instead of endless loops

Real tasks · Priorities · Preemption · Hardware abstraction

STM32 Blue Pill

Blue Pill · STM32F103C8T6

Birth of a Node

CAN bus instead of RS485

Multi-master · Inherent collision resolution · Resilient

1 MBit instead of EIB/KNX 9600 baud

Good for 20–30m runs. Plenty for a house.

STM32F103 instead of ATmega328

72 MHz ARM Cortex-M3 · 7× faster than Arduino

FreeRTOS + libopencm3 instead of endless loops

Real tasks · Priorities · Preemption · Hardware abstraction

STM32 Green Pill

Green Pill · STM32F103C8T6 · DIY

Why CAN?

Why CAN? Hardware arbitration (CSMA/CR) · true multi-master · 1 Mbit/s · differential · industrial grade

Why not WiFi/Zigbee? No batteries to die. No mesh to collapse. Building for 50 years, not 5.

Why not RS485? No arbitration. Master-slave only. Two nodes transmit = garbage.

Why not KNX? 9600 baud (1990s design). Expensive. Closed ecosystem.

Birth of a Hub

So you have 20 nodes — now what?

WHIP Hub Assembly

Hub assembly · DIN rail mount

Waveshare 2-CH CAN HAT

Waveshare 2-CH CAN HAT

RasPi 4B/5 · 2-ch CAN HAT · Relay Board · DIN Rail Mount · CAN/IP & CAN/CAN Gateway

WHIP Architecture

Nodes — STM32 MCUs · FreeRTOS · 1MBit CAN bus · Autonomous C / Embedded

Hubs — RasPi · CAN/IP gateway · Hub Aggregation · Protocol bridges · Mojolicious Perl

Server — Orchestration · External connectivity Perl

Higher layers are always a supplement, never a requirement.

Nodes — STM32 + FreeRTOS

Hardware

STM32F103 (Cortex-M3, 72 MHz)

Software

FreeRTOS · libopencm3

115+

sensor/actuator modules

🌡️ BME280 · DS18x20 · SHT3x · NTC

INA219 · INA226 · ACS712 · ADC

💡 DALI · WS281x · PWM dimmer · SSR

🔌 PCF857x · MCP23017 · relay · GPIO

📡 LoRa · Modbus RTU · 1-Wire · SPI

🖥️ SSD1306 · ST7735 · status LEDs

🍃 SCD4x · SGP4x · PMS5003 · SEN5x

🛸 AS3935 (franklin) · MLX90614 · VL53L0x · HX711

Dependency resolver inspired by Linux Kconfig

~5 modules per node → 153,476,148 combinations

Ganglion

Ganglion = GANG of Lightweight I/O Nodes — insect-brain model.

Ganglion — In Action

DEF LightTimeout = 300       # 5 minutes

# Motion detected: light on, start timer
IF motion:detected THEN lights:on; SET $T_0 = LightTimeout

# Timer expired: light off
IF !$T_0 THEN lights:off

# Cross-node: kitchen smoke → alarm everywhere
DEF Kitchen = 42
IF Kitchen:smoke:detected THEN buzzer:alarm(1)

Hubs — a Pantheon

Specialized RasPi hubs. Named by function, not by accident.

Raijin

⚡ Thunder god — energy: Victron, BMS, MPPT, 120 kWh batteries

Lucifer

💡 Light bearer — DALI lighting: 4 buses, scenes, presence simulation

Bragi

🎵 Norse god of poetry — multiroom audio, voice, AI assist

Gaia

🌿 Earth goddess — greenhouse, garden, pond, irrigation

Tyr

⚔️ God of war — ...you can guess.

No hub is a single point of failure. Each domain runs independently.

SELV-DALI — Lighting without mains

SELV = Safety Extra Low Voltage. Under 60V DC. Safe to touch.

The trick: Entire lighting chain runs from battery storage. 48V → 24V DC/DC → LED. No 230V AC anywhere.

DALI controls at 16V. Switches, sensors, dimmers — all SELV.

Inverters fail? Lights stay on — they bypass AC entirely.

Switch next to the bathtub? No problem. No electrician needed.

QR EN

🇬🇧 English

QR DE

🇩🇪 Deutsch

WHIP — Protocols & Integrations

Protocols

CAN bus 1Mbit Modbus TCP/RTU DALI MQTT SNMP I2C 1-Wire

Modbus

17 of 21 function codes · 869 tests · 91% coverage

30+ external integrations

Victron VRM · MasterTherm · PVGIS · Discord · Nextcloud · Proxmox · UniFi · ...

All protocol handlers in Perl · Mojolicious async I/O

WHIP — In production

Villa-A (Prague) — completely off-grid

  • 40 kWp solar · 120 kWh LiFePO4 · 3× Multiplus-II 10kVA
  • MasterTherm heat pump · capillary ceiling heating/cooling
  • DALI lighting across 4 buses · distributed CAN nodes

Villa-B (Germany) — same concept, different config

Two deployments = real generalization, not "works on my machine"

Invisible when it works. Competent when it matters. Built for decades, not warranties.

03 AI does Perl

Using AI to write Perl — the practical reality

AI contributed...

A lot of Perl prototypes — some grew to standard tools

  • An own PVGIS — but better (many roofs)
  • Modbus, CANbus CLI command & introspection
  • Heat loss / inrush simulations (better than "energy experts")

FreeRTOS / libopencm3 source

  • Modularized firmware for STM32 nodes
  • Over 100 modules (combinatorics!)
  • Tests, build system

A lot of Perl code

  • API endpoints to ... everything
  • Discord, Reddit, Twitter, Kraken, Ollama, Proxmox
  • AWS, Azure, Anthropic, Kodi, Nextcloud, ... you name it!

"Small" side projects

MIB Parser

SNMP::MIB::Compiler — best MIB parser. Period. open source

Grpc::FFI

gRPC for Perl — because everything else was dead open source

The CPAN Situation

2025-09-28

"User 'PETAMEM' set to nologin. Your account may have been included in a precautionary password reset in the wake of a data breach incident at some other site. Please talk to modules@perl.org to find out how to proceed."

→ Talked to modules@perl.org. No answer.

2025-10-01

"Ich leite das mal auf Steffen Winklers Empfehlung hier weiter an Dich, weil von modules@perl.org bislang keine Reaktion kam. Würde jetzt mal wieder gerne ein paar Module auf CPAN schmeissen. :-)"

→ An Andy Koenig. No answer.

2025-10-06

"Hi Sören. Weißt Du zufällig wo ich eines Andy König oder halt jemanden der mit PAUSE/CPAN weiterhelfen kann habhaft werden könnte? Wir würden mal gerne unsere Module auf Vordermann bringen, aber [...] und bei modules@perl.org oder andyk@cpan.org rührt sich keiner."

→ An Sören Laird, LinkedIn. No answer.

SNMP & MIBs — quick primer

SNMP

Simple Network Management Protocol — how you monitor and manage network devices. Routers, switches, firewalls, UPS, printers — anything with an IP.

MIB

Management Information Base — the schema. Defines what each device can report: CPU load, interface counters, temperature, error rates, ...

The problem

Thousands of vendor MIBs. Written in ASN.1. Riddled with vendor deviations from the standard. Every monitoring system needs a parser — and every parser struggles.

SNMP::MIB::Compiler

2 days with AI · CPAN module was 93% working · targeted fixes, no rewrite

Parser Language Failures Pass rate
pysmi Python 296 93.8%
gosmi Go 91 98.1%
Ours Perl 39 99.2%

4740 MIBs · 301 fixes · 52 fewer failures than Go

QR GitHub github.com/petajoulecorp/SNMP-MIB-Compiler

Grpc::FFI

Wanted gRPC in Perl. Everything on CPAN: dormant, dead, or broken.

So we built it. From scratch. FFI::Platypus bindings to the gRPC C API.

Learning path: UUID::FFI → SQLite::FFI → Grpc::FFI

326 tests passing · zero memory leaks · zero crashes

Cross-language: Perl client ↔ Java/Go servers — working

Streaming: unary, client, server, bidirectional

85% production ready · ~43 implementation files

And yet — this was a prelude

What if FFI wasn't just a tailored library connector...

Navigator / Orchestrator

AI doesn't do this on its own.

Human: strategy, architecture, learning path, priorities

AI: execution, documentation, pattern learning, iteration

gRPC example: I decided "start with UUID, then SQLite, then gRPC"

Without the navigator — the AI builds impressive things that go nowhere.

So... AI is quite good at Coding.

But how far can this actually go?

04 AI does Perl

Turning the predicate around.

pperl

PetaPerl  /  ParallelPerl

A Perl 5 interpreter — designed by humans.

Written in Rust — by many AI agents.

Serious — no toy or academic exercise.

pperl badge

pperl

PetaPerl  /  ParallelPerl

A Perl 5 interpreter Platform — designed by humans.

Written in Rust — by many AI agents.

Serious — no toy or academic exercise.

pperl badge

pperl — Not the first attempt

Topaz

1999 · C++ rewrite · Chip Salzenberg · abandoned

B::C / perlcc

1996–2016 · Perl-to-C compiler · dead

cperl

2015–2020 · Perl 5 fork · Reini Urban · dormant

RPerl

Restricted Perl → C++ · Will Braswell · dormant

WebPerl

Perl 5 → WebAssembly · runs in browser · semi-active

PerlOnJava

Perl 5 on JVM · Flavio Glock · active — talk at this GPW!

Common failure mode: underestimating Perl 5's complexity

pperl — Scope

Perl 5.42 — ish

Compatibility: strive for maximum Perl 5 compliance, currently 5.42

XS: no, but yes

Linux only — all architectures

We really don't care about use v5.xx

pperl — Status

22,000+

tests total

~61–400 failures — give or take

Performance: good, bad and ugly

Quotes from the AI

13095 pass (+25 from previous 13070), 31 fail (down from 46!). The File::Path native implementation not only works, it unblocked 15 previously-failing tests that depended on File::Path. Zero regressions.

pperl — Benchmarks

Benchmark perl5 pperl ratio
list_util::sum 191.8K 372.8K 1.9x
list_util::min 199.8K 772.9K 3.9x
list_util::max 201.3K 673.7K 3.3x
list_util::product 2.7M 4.0M 1.5x

Native Rust implementations — not XS, not C

pperl — Beyond Perl5

Maximum compatibility. But more.

Autoparallelization — for/map/grep via Rayon · transparent · no threads pragma

JIT Compilation — Cranelift · hot codepath detection · native code at runtime

Auto-FFI — call any C library · no XS · no compilation · Peta::FFI namespace

Pre-Compile — .plc blobs · skip parsing · near-instant startup

Daemonize — emacs-style daemon/client · shared memory · zero cold start

Autoparallelization

Powered by Rayon — Rust's data-parallelism library

Work-stealing scheduler

One-line change in Rust

Guaranteed data-race freedom

# This just works. In parallel.
my @results = map { expensive_computation($_) } @large_list;

# No threads. No MCE. No forks.
# pperl detects safe loops → Rayon handles the rest.

--parallel flag · list ≥ 1000 items · no shared mutation

JIT Compilation

Just-In-Time — compile to machine code while running

How it works in pperl:

  1. Interpreter runs normally — profiling hot paths
  2. Hot loop detected → lower to Cranelift IR
  3. Cranelift compiles IR → native machine code
  4. Next iteration runs as native code — zero dispatch overhead

Cranelift — the compiler backend behind Wasmtime and Rust's alternative codegen.

# pperl detects this as a hot loop pattern
my $sum = 0;
for my $i (1 .. 1_000_000) {
    $sum += $i;
}
# → Cranelift compiles to native machine code

JIT — First Win

Inner loop JIT — single hot loop compiled to native code

Benchmarkperl5pperl interpretedpperl JITvs perl5
Mandelbrot 133ms 1493ms 41ms 3.2× faster
Ackermann 13ms 630ms 12ms 1.1× faster

The JIT fired and the test passes! The answer is correct (500000500000).

Good. But only the innermost loop is compiled. What about nested loops?

$py = 0; while ($py < $height) { $y0 = $y_min + $py * $y_step; $row_off = $py * $width; $px = 0; while ($px < $width) { $x0 = $x_min + $px * $x_step; $zr = 0.0; $zi = 0.0; $iter = 0; while ($iter < $max_iter) { $r2 = $zr * $zr; $i2 = $zi * $zi; last if ($r2 + $i2 > 4.0); $zi = 2.0 * $zr * $zi + $y0; $zr = $r2 - $i2 + $x0; $iter++; } $frame[$row_off + $px] = $color_lut[$iter]; $px++; } $py++; }

JIT — The Code

Mandelbrot set

Pure Perl.

JIT — Full Nested

All 3 loop levels compiled as one native function

Mandelbrot 1000×1000perl5pperl interpretedpperl JITvs perl5
Wall time 12,514ms 163ms 76× faster

200 million escape iterations of float arithmetic.

Perl. With JIT. That's a sentence nobody expected.

Autoparallel JIT — Full Win

JIT + Rayon: compile to native, then split across cores

Mandelbrotperl5pperl JITpperl JIT + 8 threadsvs perl5
1000×1000 12,514ms 163ms 29ms 431× faster
4000×4000 ~200s 2,304ms 342ms ~580× faster

JIT alone: 76×. Adding 8 threads: another ~7× on top.

Demo Time!

Auto-FFI

No XS. No Inline::C. No compilation. Just call C.

# Layer 0 — Raw: any library, you provide type signatures
use Peta::FFI qw(dlopen call);
my $lib = dlopen("libz.so.1");
my $ver = call($lib, "zlibVersion", "()p");
say "zlib: $ver";    # 1.3.1
# Layer 1 — Pre-baked: curated signatures, zero ceremony
use Peta::FFI::Libc qw(getpid strlen strerror uname);
say strlen("hello");           # 5
my @info = uname();
say "$info[0] $info[2]";      # Linux 6.18.6-arch1-1

Pack-style type codes: (p)L = strlen(const char*) → size_t

Auto-FFI — Details

Powered by libffi — any signature works, no pre-generated stubs

LayerScopeMechanism
Raw (Layer 0)Any .so on the systemdlopen + dlsym + libffi call frame
Pre-baked (Layer 1)libc, libuuid, ...Direct Rust libc::* calls — zero overhead
Discovery (Layer 2)System-wide scanscan() → hashref of { soname => path }
# Layer 2 — What's on this system?
use Peta::FFI qw(scan dlopen call);
my $libs = scan();
say scalar(keys %$libs), " libraries found";
if (exists $libs->{"libz.so.1"}) {
    my $z = dlopen("libz.so.1");
    say "zlib: ", call($z, "zlibVersion", "()p");
}

Libc: ~30 functions (process, strings, env, math, file, time)

Bytecode Cache (.plc)

Like Python's .pyc — but for Perl. Opt-in.

# Default: no caching (safe for development)
$ pperl script.pl

# Enable: compile once, load from cache on subsequent runs
$ pperl --cache script.pl

# Invalidate all caches
$ pperl --flush

First run: parse → codegen → execute → save .plc

Second run: load .plc → execute (no parsing, no codegen)

Bytecode Cache — Details

Storable-model: bincode deserializes directly to final runtime types. Zero intermediate conversion.

Benchmark perl5 pperl pperl --cache
three_modules 22.3ms 12.6ms 9.9ms
mixed_native_fallback 26.3ms 13.0ms 10.0ms
deep_deps 18.1ms 13.1ms 9.9ms

Net module-loading cost: 33–37% faster with cache. Biggest win on fallback modules. Native Rust modules already near-zero cost.

SHA-256 keyed · mtime + version validation · aggressive format versioning

Daemonize

Emacs-style daemon/client model

$ pperl --daemon script.pl   # compile, warm up, listen
$ pperl --client script.pl   # connect → fork → run → respond
$ pperl --stop   script.pl   # clean shutdown

First run: parse → codegen → execute (warm-up) → listen

fork() gives each client a fresh address space

Daemonize — Details

Benchmarkperl5pperl--cache--daemon
5 native modules 15.0ms4.3ms4.3ms4.6ms
fallback + native mix 23.5ms15.8ms~10ms 5.0ms (3.2×)

Eliminates both startup costs: process creation (~3-4ms) + module compilation (0-15ms)

Unix domain socket · JSON wire protocol · copy-on-write pages via fork()

Daemonize — Prior Art

SolutionScopeIsolationState leakageStatus
PPerlGeneral CLINoneYesDead (2004)
SpeedyCGICGINoneYesDead (2003)
mod_perlApachePer-childPer-requestMaintained
StarmanPSGIPer-workerPer-requestMaintained
FastCGIWebPer-processPer-requestMaintained
pperl daemonGeneral CLIPer-request (fork)NoneActive

All prior solutions: same interpreter across requests — state leakage by design

Future pperl

Seamless GPU — restricted Perl → OpenCL/HIP/Vulkan/CUDA kernel · same code, GPU execution

pperl-mini — tailored and scaled down versions. Maybe on a Raspberry Pico one day?

pperl-compiler — Maybe code running on a STM32 one day?

When to use pperl

Good fit:

  • Workloads that benefit from JIT and/or autoparallelization
  • Scripts using native builtins (50+ Rust modules, fast)
  • Fast startup — inherently ~2× faster than perl5, plus --cache
  • pperl-specific features: Auto-FFI, Daemonize, Bytecode Cache
  • Security: different codebase — unlikely to share CVEs with perl5
  • Smaller, less complex scripts

Not yet:

  • Large, complex codebases — edge cases where pperl differs from perl5
  • We strive for maximum compatibility, but we're not 100% there yet

Rule of thumb: the longer and more complex the script,

Correctness Case Study

How serious is "maximum compatibility"?

The bug: $, (OFS) vs $\ (ORS) in print

pperl checked both with the same flag mask. Perl5 doesn't.

perl5 — $, (OFS)

if (SvGMAGICAL(ofs) || SvOK(ofs))

Checks get-magic AND ok-flags

perl5 — $\ (ORS)

if (PL_ors_sv && SvOK(PL_ors_sv))

Checks ok-flags only. No get-magic.

pperl had:

// Same mask for both — SVS_GMG included for ORS. Wrong.
if flags & (SVF_IOK | SVF_NOK | SVF_POK | SVF_ROK | SVS_GMG) != 0

Practical impact: near zero.

To trigger this, you'd need a tie on $\ whose FETCH returns undef, while the underlying SV has get-magic set but none of IOK/NOK/POK/ROK — and then call print. Nobody writes this. Nobody has ever written this.

We fixed it anyway.

The depth of compatibility is the product's guarantee.

I have been doing

Get it here:

QR perl.petamem.com perl.petamem.com

Danke.

Richard Jelinek  ·  rj@petamem.com

One more thing.

psh

An interactive Perl shell

ls "-la";                    # it's just a sub call
cd "/tmp";                   # chdir wrapper
ps "aux";                    # system command

# But you're already in a scripting language:
for my $f (glob("*.log")) {
    if (-M $f > 7) {
        rm $f;
        say "cleaned $f";
    }
}

Object pipes — pass data structures, not text:

ps() | grep { $_->{mem} > 100_000 }
     | sort { $b->{cpu} <=> $a->{cpu} };

PowerShell's philosophy · Perl's text power · pperl's JIT speed