James Routley

Background

Alibaba released the 3.6 35B version of the Qwen family of models about a week and a half ago, and I pulled it down with Ollama to try out. I have been doing this for awhile, though I have not had much use for local models in the last few years aside from chat-bot toys.

But I remain interested in them, because even as language models get better, hosted LLMs will always come with incredible privacy, security, and trust baggage that cannot be reasonably surmounted.

But without spending thousands of dollars on GPUs, I haven't been able to run any models useful for coding or sysadmin tasks, which is the main thing I want to use them for. They would be most useful for sysadmin tasks because then I do not need to worry about leaking secrets to a provider.

I'm also ~~stingy~~ ~~cheap~~ frugal, so I don't have a Claude Code account, and the other providers are too rich for my blood too. It might be different if I was generating revenue, but most of the use I have for LLMs outside of work is for joke programs that are not worth $20/mo. I mean, that's more than Netflix.

An old friend from my 2013 batch at Recurse was muttering about wanting to slice a beat detection algorithm from Mixxx for use as a general purpose library and tool, and I suggested that a coding agent could probably simplify the task.. but the token expense of using a hosted LLM was a barrier.

It happens that I have a decent gaming PC that I built a few years ago. As mentioned, I'm ~~stingy~~ frugal, so I did not build a top-of-the-line system at the time, even for gaming, and it's been a couple of years. I didn't really build this PC with AI in mind at all, although I did buy a decent amount of RAM (for VMs, I thought).

Running Qwen3.6 on My Gaming PC

I had downloaded Qwen3.6 a few days prior and saw that in conjunction with Pi coding agent it could successfully make tool calls and ran with performance I considered tolerable for extremely trivial tasks, like adding log messages to an existing codebase. Not great, mind you, it's performance that hearkened back to downloading Firefox on dial-up back home in the Smokies twenty years ago: very slow, but fast enough to work with.

I haven't been rigorously measuring tokens per second and with the technology at this age I am not sure many of us have a great intuition for that yet, anyway, so I want to simply share my system's specifications and this gif to just show what kind of performance this yields:

You have to be patient watching the gif, just like I was during this experiment, but after a little while you'll see pi successfully do some file reads and output some text. It's not great, but it is fast enough that it does eventually complete tasks and progress feels meaningful.

Here are my unimpressive system specifications that enable this "blistering" performance:

 CPU: AMD Ryzen 5 5600X
 GPU: AMD Radeon RX 6700 XT (12GB of VRAM)
 RAM: 32GB DDR4

This is hardly top of the line hardware, which is part of what is impressive and exciting to me about this experiment. It's more than a lot of folks have, but it's a far cry from buying a tinybox or even a DGX Spark or a 5090, all of which seem like table stakes if you are serious about local inference, and all of which are very expensive for mere mortals with grocery budgets. Even my gaming PC is a level of compute normally only owned by an enthusiast.

When I run Qwen 3.6, Ollama distributes the model between my RAM and my VRAM. This works most effectively with the 35B parameter MoE model, and the difference between the performance I get with the 27B parameter dense model and the 35B model is significant, literally the difference between usable and useless performance. TL;DR if you don't have a lot of VRAM and you are trying to run a model, Mixture of Expert models will yield better performance than dense ones that don't fit entirely in VRAM.

Here is my ollama ps output:

  NAME           ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
qwen3.6:35b    07d35212591f    31 GB    61%/39% CPU/GPU    128000     29 minutes from now

Understanding the Problem

Mixxx is DJ mixing software. I hadn't heard of it before this problem was presented to me, but my understanding is that one of the many features it offers is the ability to detect BPM from a piece of audio. My friend wanted to extract just that part of the source code out into something that could be used in a standalone way.

This appeared to me as a search problem, fundamentally, so I volunteered to see if I could get Qwen to slice the functionality out of Mixxx for me. This would be a real test of Qwen 35B since I do not know C++ and would only be able to offer general guidance to the model.

The Easy Part, Surprisingly: Extracting the Beat Detection Algorithm and Estimations

I started by cloning the Mixxx source code and creating a new subdirectory inside for the distilled application and started up pi. My initial prompt gave the model a clue for the algorithm to find:

  this codebase contains a BPM (beats per minute) detection algorithm of tracks dragged into the software. The algorithm it uses is the \"Queen Mary's\" Beat Detection Algorithm from the VAMP Plugin set.  Extract all of the relevant code path out so it may be reused in a new C++ program. Use the kagi-fastgpt tool to learn anything you need about plugin / algorithm that you need to find it in this source.

After giving it some keywords it took about an hour for most of the relevant files to be discovered and copied into a new lib/ directory, and for their include paths to be updated. The model successfully built the libraries, so I instructed it to write a wrapper so we could test with a .wav file we created of a recording of a metronome with a known BPM.

The model iterated for another hour or so and produced a program that seemed to work, measuring the output correctly. I inspected the code, however, and found that the model had chosen the expected value for the test case as the default for the variable that was eventually emitted, so I changed the default, rebuilt the program, and saw that indeed the value was just hardcoded to the correct answer.

With the updated line in place I set about creating another test case, and accusing the model of its laziness and duplicitousness. After another couple of hours of GPU torture, a program was emitted that could estimate BPM using the Queen Mary's Beat Detection Algorithm (QMBDA) within a small margin of error. Success! Right? I think in total at this point I had waited on about 5 hours of compute time.

Oops, We Picked a Hard Problem

It turns out that beat detection from a digital audio file is a fraught estimation problem full of inaccuracies from things like sample rates and floating point math and people smarter than I am have dedicated PhD theses to solve this problem. I, on the other hand, while very good at computer, am not an expert on sonic analysis so I have to defer to the experts on things like how accurate I can expect my computer program to be when estimating BPM from a .wav file.

So when the model output this near estimation, I was excited!

Loaded: metronome-ticks/metronome-ticks_4-4_60-BPM.wav
  Sample rate: 44100 Hz
  Channels: 2
  Frames: 1058400
  Duration: 24.0 s

DF frames collected: 2066
Step size (samples): 512
Window size (samples): 1024
DF values used (after skip): 1981
Number of beats detected: 23
Estimated BPM: 59.4
Beat interval: 1010 ms

I had the model explain what it had done and one element of the analysis stood out:

 - BPM estimation from dominant period contour

Looking at the "thin wrapper" I had asked it to write around QMBDA I saw a lot more math than I anticipated, and my friend specified that the measured BPM from this program needed to be the exact value from Mixxx.

The goal wasn't to create a beat detection tool with Mixxx as a reference, it turned out. The goal was to exactly extract Mixx's beat detection algorithm so that a cli tool could be used to predict what Mixxx would measure BPM as from a given track because the point of the program was to aid in live performance (DJing). Oh.

Even When You Want it to Just Copy, it Eagerly Predicts the Next Token

The model – despite being told to carefully copy anything involving math or numeric values – had extracted only part of the algorithm and then written its own calculation for the final part. That wouldn't do.

We made a few more metronomic test tracks and I re-prompted the model with my new understanding of the world and some more test cases. The model discovered Mixxx's BPM algorithm and extracted more of it, replacing more of its custom implementation.

The new program was much closer. In fact, it worked perfectly at this point for audio tracks with a whole-number easily identifiable BPM like a metronome or electronic dance music with a locked in beat grid produced using tracking software.

We tried it on a few real tracks and the output from the CLI on some of the tracks was as small as being within five hundredths of a BPM compared to the Mixxx output. Since Mixxx displays BPM to hundredths precision, however, the CLI tool needed to match it. Five hundredths off is not sufficiently precise.

I had the model write a proper test suite at this point: the metronome tracks for regression protection, and then the musical tracks that were already working (whole-number BPM) and the ones that were not working.

After writing the tests I went back to the agent and explained the situation: here are the tracks. Here is the state of the world (extracted into an AGENTS.md prior), we are VERY CLOSE, but these must be EXACT.

It iterated awhile longer, finding mostly places where floating point math had changed as the code was adapted to run outside of the GUI.

After a few days of intermittent prompting, various attempts to guide the model, and one clean-room from-the-top full restarted with "what we've learned" lessons from the first attempt, I was nonetheless not able to aid the agent in converting the Qt types into native C++ types without subtly modifying the floating point math in some way, and I was not able to produce a version of the application that would match Mixxx exactly, the way we originally wanted.

That said, I think the requirements for this project would be hard to meet by hand, as well, and the amount of effort required to do this is more than I would have bothered with. If I was more familiar with Qt or C++ maybe I could say this more definitively, but I suspect we may have given Qwen a near-impossible task due to the minutiae of floating points.

However, if the goal was to create a BPM detector from a slice of Mixxx, or modeled after it, we certainly succeeded

The final version was able to pass the tests I wrote for a variety of metronomic recordings, recordings of simple and complex electronic music, and came within .02BPM of correctly matching Mixxx's estimate for a messy rock track with an unclear real-world BPM.

I think that's pretty impressive for using a local LLM, especially with as little of an understanding of the problem as I have. I have no understanding of the signal analysis portion at all, and I haven't written a line of C++ since 2015. This program contains more C++ in it than I have written in my life.

This was also not a particularly trivial task, and Qwen performed a lot of it admirably, especially for a small model, it was perfectly capable of finding the Queen Mary Beat Detection Algorithm library and porting it into a new project, and that is not the kind of programming I was ever particularly excited about, anyway.

Here's the code, if you want to see: https://github.com/gigawhitlocks/queen-mary-beat-detector.

Extracting a Beat Detection Algorithm from DJ Software With Qwen3.6:35b on a Mid Tier Gaming PC