This essay will introduce a rule for AI use that should prevent a whole host of bad outcomes. More importantly, the rule is built on top of some smart philosophy, which means it is carefully selected to be invariant — that is, it shouldn’t change until we get to Artificial General Intelligence (AGI).
This resistance to change is useful! In a world of rapid AI development, building business policy around things that don’t change is worth its weight in gold.
I’ll introduce the rule, explain why it works, and then walk you through the various instantiations of the rule in action. This rule is not unique — only the synthesis is unique. It turns out that sophisticated users of AI have all stumbled onto the rule and actively apply it in their work; this is merely a formulation that is easy to remember and therefore easier to apply.
I’ve named this rule the Vaughn Tan Rule, because I’ve adapted it from some work the consultant, business professor and writer Vaughn Tan has done recently on AI-use. Vaughn is currently shaping these ideas for AI education at various institutions around the world. Needless to say, everything good about these ideas is attributable to him; all the mistakes here are mine alone.
The Vaughn Tan Rule goes like this:
Do NOT outsource your subjective value judgments to an AI, unless you have a good reason to, in which case make sure the reason is explicitly stated.
Let’s break it down:
The Vaughn Tan Rule starts from a practical observation: in order to come up with good AI tools, education, workflows, and use cases, we need a guiding policy for use. Vaughn’s observation is that we should start from something that humans can do right now, and that AIs cannot do at all — at least not for some time.
Today, the one thing that humans can do that AIs cannot do is meaning-making.
What is meaning-making? Vaughn’s chosen definition is that meaning-making is “any decision we make about the subjective value of a thing.”
Humans make meaning all the time — in business and in life. We:
These four types of meaning-making underpin the philosophical argument for the rule. They are the four types of meaning-making that Vaughn recognises in his work.
Notice that capability at meaning-making is not a spectrum. Unlike other capabilities, AIs will not slowly ‘get better’ at meaning-making as their capabilities advance. Instead, we should expect some kind of phase shift: if we ever get to AGI, there should be a societal transition period where AIs are regarded as equivalent meaning-making entities and are afforded rights like humans, or animals.
(Or, maybe not — predictions are hard.)
Because we do not grant Large Language Models (LLMs) rights today, or regard them as equivalent responsibility-bearing entities, we should not outsource our subjective value judgments to AIs.[1]
That argument should stand on its own, but Vaughn points out that even if we do grant LLMs rights, it is questionable — as in, literally, you should question this! — if you want to outsource your subjective value judgment to something else.
And finally, on a practical level: you are not currently able to say “this $500M loss was the fault of ChatGPT, because we did what it told us to do.” This is about as reasonable as saying “I went to our garage and did what the hammer told me to do” and you will be justly derided for it (or possibly fired, in the case of a $500M write-down).
The point of giving you this explanation is to reassure you that there is some theoretical basis behind the Vaughn Tan Rule, and that this theoretical basis will not change. It is one of the sublime ironies of our current moment that while cutting-edge LLMs are the crowning achievement of machine learning (as STEM a field as any), the Vaughn Tan Rule is derived from the humanities.
Commoncog is focused on pragmatic ideas for use in business and in life, so I’ll stop here. For more details on the theory, read Vaughn’s on-going series on the topic.
I claimed earlier that most sophisticated users of AI have stumbled onto a version of the Vaughn Tan Rule in their own AI use. I’ll start with some trivial examples from my own life, just to give you a flavour of how to use this, and then shift over to some successful real world examples.
Hopefully this gives you an idea for how to use the rule in action. To drive this home, here are several examples of the Vaughn Tan Rule in action from across the world:
English teachers spend a large amount of time grading essays and providing students with written feedback. Daisy Christodoulou is Director of Education at No More Marking, a provider of online comparative judgment software for schools. In a podcast interview in March earlier this year, she summarised what she’s found from more than two years of LLM experimentation with real world school grading systems.
At first, Christodoulou’s team thought that LLMs could help with teacher marking directly. That is, give the LLM a student essay, and then have the LLM grade the essay and write the feedback. But of course LLMs were horrible at this.
Then, Christodoulou’s team used an insight they already had: that humans were also quite bad at direct evaluation. It turns out the gold standard for grading is ‘comparative judgment’: show two essays side by side and ask a human which essay is better, and why. No More Marking had a corpus of 40 million human judgments doing comparative marking. So Christodoulou decided to test this with LLMs as well. And it turned out that LLMs were slightly better at doing comparative evaluation as compared to direct evaluation, but they started making mistakes that humans just didn’t make with comparative judging. As an example, they found that LLMs tend to pick the left-most essay, for certain pairings of essays.
So the next thing they did was to feed each piece of writing to the LLM to be comparatively judged twice, each way around. When they did that, they learnt that the LLM was mostly making these sort of “prefer the left-most essay” errors on essay pairs that were quite close to each in quality. This was a relief. It meant that they could set up the grading system such that essay pairs that are close in quality would go to a human, which would still cut down a teacher’s workload by a fair amount.
What should these teachers do with their free time? Christodoulou’s team wanted teachers to spend more time on feedback, which is actually what helps students improve.
Notice that this is consistent with the clause in the Vaughn Tan Rule: Christodoulou’s team was outsourcing subjective value judgments in grading to an AI, with the caveat that the most difficult pairs were still sent to a human grader. But they were making this tradeoff because they believed it was worth it: essay assessment is only part of the goal of marking; if they could do a ‘good enough’ job on this, they could free up the human teachers to give better feedback, which they judged to be more important.
So how did Christodoulou’s team approach the feedback task? Of course, the first thing they did was that they fed essays to an LLM and asked it for writing feedback. They quickly found that the LLM would produce these superficially impressive but ultimately very generic pieces of writing feedback. This was not helpful for student improvement. So it was unacceptable.
Next, they decided “ok, let’s have a human in the loop!” They fed the essays to an LLM and then let the teachers read the essay, read the AI-generated feedback, and edit the feedback before giving it to the student. Here, the teachers revolted. Every single teacher said “it takes more time to read the essay and the AI feedback and think about how to modify the AI feedback; by the time I’m done it would’ve been faster if I just read the essay and then wrote the feedback from scratch myself!”
This is known as the ‘Paradox of Automation’ — if you have a system that can get to 90-95% quality, getting a human operator to stay alert enough to spot the 5%-to-10% of errors is actually terribly hard; you would get better total system performance if you let the human do the entire task by themselves.
So what did they do next? Christodoulou’s team flipped the approach. First, they built a system where the teachers would be able to leave audio feedback whilst reading the essay. Then they set it up so that multiple teachers could leave audio comments on each piece. The important thing here was that teachers could be as harsh in their feedback as they liked, so that giving feedback was easier for them; they didn’t have to spend time softening their tone.
The system would then use AI to:
This approach works on at least two levels: first, teachers would be able to read — in their high-level report — “20 out of 35 students in your class had problems with tenses” and would then be able to change their lesson plans in response to that.
Second, students feel ‘seen’ when receiving human feedback. Christodoulou says:
… what people have loved about the initial prototype for this version — and we have got some incredible feedback on it — is that you can actually say to the student, ‘This feedback you’re getting, this paragraph, it’s from four or five different teachers in the school.’ And what we’ve found as well is the best kind of audio feedback is when the teacher picks out something — a nice word or a phrase — in the student’s writing and says, ‘I love it when you say X.’ And if a couple of teachers do that, that will feature in the feedback the students get.
So, our initial findings are that the thing people like the most about it and the thing where I will say written feedback does offer something is the feeling of the work being seen and the student feeling seen and the student feeling motivated to want to do their best because a teacher who they know and respect is reading it and paying it attention. So, we think where we are now, we’ve alighted on something that kind of is working, but it is early days and we ourselves need to gather more feedback.
This is as perfect an example of shared meaning-making as one can get.
In fact, notice how the system maximised the amount of teacher bandwidth spent on meaning-making activities … by using AI to offload as much non-meaning-making work as possible!
A direct consequence of the Vaughn Tan rule is this: AI systems do best when designed to leave the meaning-making bits to the humans.
Compare the example above to the way ex-portfolio manager (and Commoncog member) Brian Lui self-taught himself fiction writing.
For this example to be credible, you would have to trust my judgment of writing skill. I can vouch for Lui’s improvement at fiction: I was familiar with his writing before he started on this hobby, and have read his latest fiction output after he self-taught himself with the help of LLMs. My assessment is that there is marked improvement.
Brian’s approach was this (posted in the Commoncog member forums after a bit of refinement):
A. ask the AI for its opinion (subjective value judgement!) but
B. if you disagree, you are ALWAYS right. The AI is for pointing out things you might have overlooked.
C. NEVER ask the AI to do your core skill.
So if I’m writing, I never ask the AI to write anything. I don’t even ask it to suggest better phrases. Instead, I ask it to point out words or phrases that are weak. Then I think of better words myself.
Thomas Li is founder and CEO at Daloopa, a financial data provider to many of the world’s top hedge funds and investment banks. (He’s also a Commoncog member, but I digress). In a recent podcast interview, Li outlined the various ways he’s seen large hedge funds and investment banks integrate LLMs into their work. Li’s perspective is particularly valuable because Daloopa is a data input into these internal AI systems; he has a unique perch into this world.
Li says:
Li: Here is one really smart use case I’ve seen a lot [in these places] is, let's say you have a bunch of internal notes that you've written about a business and your internal understanding of a business, and you say, “Okay, I want to correlate my internal understanding of the business with every single earnings call that has been created in the last four quarters since I started covering this company. Are there any discrepancies?” AI is phenomenal at that. That’s a pretty common use case. [Though] most buy side firms don’t do that because they are not allowed to upload anything to ChatGPT …
Host: I haven’t thought about that use case! So would I … write my own thesis on a company — which I do all the time — upload it to ChatGPT, and then say: “is this thesis disproven by something that’s happened in the past six months?”
Li: Yeah generally you want to be more specific than that. So you would say, “Here is the latest earnings transcript. Here is the latest conference transcript they were presenting at Morgan Stanley TMT conference, for instance. Here’s the transcript for that. Here are my notes prior to these two events happening. Where are the inconsistencies? [In this case ChatGPT is acting] like a really smart black lining function, if you think about it. So that works.
And then, in response to a question about the differences in AI adoption between pod shops (large multi-strategy hedge funds) vs ‘traditional, sleepy long-only hedge funds’, Li says:
Li: I wouldn't categorise it that way [that pod shops are adopting this more than long-onlys], because what we’ve seen is that both the big pod shops and the big, long-onlys have really put their foot down and invested in AI. I think they’ve seen the benefits. To your point, it’s like [the emergence of] Google all over again. If you don’t use Google as a research analyst [today], you’re just not doing your job right. So I think there is no hesitancy between the pod shops and the long-onlys at saying: “We got it. We got to do something here. We got to figure out how to create applications and [understand] how it works.”
[Note: in a separate part of the interview Li notes that such firms are still a small subset of the firms in Wall Street; this is leading edge and not yet widespread. But he sees adoption across all sorts of actors in high finance. The main commonality across these groups is the fact that a principal has autonomy and access to a research budget.]
But you're absolutely right on the research budget front, there are firms with huge research budgets, and there are firms with much smaller research budgets. Generally big AUM firms, doesn’t matter if they’re pod or long-only or long-short, have much bigger research budgets, and the big discrepancy that we’re seeing is whether or not they build internal tools. So the thing about AI that is very different from a search engine is you will never build your own search engine, because the foundational algorithm of a search engine was always locked within the confines of Google and Microsoft. But [today] we have access to a ton of foundational models. I mean, we can literally pick a foundational model today, and three weeks later, find something that’s four times as good and half as expensive, and you can switch on a dime, right? So it’s almost like if Google just released their algorithm, and 20 other companies are releasing their search algorithms too. What that implies is we can all build search in the way we want to build search, we can build extremely fine tuned search, we can build internal search, we can build external search, we can build all sorts of search, right? So that’s what's going on with AI [right now]. The cost of building an internal AI capability now is super low because of what people like to call ‘ChatGPT wrappers’. A ‘ChatGPT wrapper’ is really just software that you build on top of a foundational model that OpenAI provides, or Anthropic provides, or one of these [other] guys provide. And what is very obvious now is there are a whole suite of buy-side firms that's willing and able to build these internal tools.
[…] What we are seeing the really big shops doing is that they’re building this. They’re saying the most difficult part of building this, historically, has always been the foundational model, the logic. But now someone has invented the logic for us. So all I need to do is feed it the information. Feeding it the information is a harder problem than you think, but it’s solvable because you can just leverage the fact that your analysts write the [research] notes. You can leverage the fact that you can purchase transcripts from data vendors. You can leverage the fact that there are vendors like Daloopa that have done the work of extracting all the data [from earnings reports] into a database. So you can say: “hey, I would like to compare how my estimates are relative to company historicals — am I getting more accurate over time, or am I getting less accurate over time?” So now this exercise is completely doable, and answerable in one question, assuming you have access to a foundational model, your analysts’s actual [financial] models, and fundamental [financial] data from Daloopa. Historically, this sort of work is a risk associate’s two week’s worth of just manually grinding through in Excel.
Notice how all of the examples Li cites — and the set-up for the systems these funds are building — do not outsource subjective value judgments to an AI.
Vaughn Tan tried to get an AI tool to schedule his meetings for him when he was in Chicago a few months back. The AI tool failed spectacularly. Calendar scheduling is one of those deceptively procedural tasks that seems like it would be trivially automatable, but actually demands a fair amount of meaning-making from humans in order to do a consistently good job of it.
For instance, a scheduler would have to know “I am meeting John, who is a dear friend I have not met for a decade, and I prioritise meeting him above nearly every other meeting during this period” and “but my boss has asked me to meet this client that is super important to the company, so that is probably the second highest priority because not doing so would have negative professional consequences for me … but I am willing to sacrifice this for a meeting with John” and also “it would be very nice if I could stuff in a meeting with Alec and Joey, but only if they are free to meet at a wine bar, since they are excellent wine people but otherwise meeting with them is not super important to me; the meet is only worth going if we get to try some new wines together.”
The Vaughn Tan Rule tells us that AI calendar scheduling tools work best for events with low meaning-making content (such as sales or work calls of roughly equal value, during the more predictable social context of one’s working hours). It tells us that it is probably best to defer to the human when scheduling events require large amounts of subjective value judgment.
In ‘The 70% Problem: Hard truths about AI-assisted Coding’, notable programmer Addy Osmani lists a number of observations from two years of investigating AI-assisted development.
Osmani writes:
Here's the most counterintuitive thing I've discovered: AI tools help experienced developers more than beginners. This seems backward – shouldn't AI democratize coding?
The reality is that AI is like having a very eager junior developer on your team. They can write code quickly, but they need constant supervision and correction. The more you know, the better you can guide them.
This creates what I call the "knowledge paradox":
- Seniors use AI to accelerate what they already know how to do
- Juniors try to use AI to learn what to do
- The results differ dramatically
I've watched senior engineers use AI to:
- Rapidly prototype ideas they already understand
- Generate basic implementations they can then refine
- Explore alternative approaches to known problems
- Automate routine coding tasks
Meanwhile, juniors often:
- Accept incorrect or outdated solutions
- Miss critical security and performance considerations
- Struggle to debug AI-generated code
- Build fragile systems they don't fully understand
Osmani’s piece comes from December 2024. It was popularised on The Pragmatic Engineer by Gergely Orosz in January 2025. In the months since, we’ve seen many more reports confirming the observations from his essay. And as mentioned previously, whether or not vibe coding is acceptable to you depends on the context in which you want to use the code. The more important or fraught the code, the more important human judgment becomes.
For another, concrete example I’d like to point to this post from Marty Cagan, the notable product leader and partner at Silicon Valley Product Group (SVPG):
Many of you know of the product discovery coach Teresa Torres. At SVPG, we have long considered her one of the very best discovery coaches in the world, and we strongly recommend her book and her courses to everyone that is serious about product.
Recently she had an unfortunate injury (she broke her ankle playing ice hockey), but in one of the best examples of turning lemons into lemonade, after a rough surgery, she decided to make good use of her forced immobility by learning how to build an AI product.
In this case, she wanted to create an intelligent product that could act as a coach to someone learning how to conduct effective customer interviews.
(…) please notice how she was not delegating the decisions or the thinking to the technology. In every case, the decisions were based on her product sense and knowledge of product discovery. This is how you want to utilise generative AI tools (emphasis added).
You may watch Torres’s video on YouTube here.
So what have I shown you? I’ve given you three things in this essay:
What should you do with this essay? You could follow up by reading Vaughn Tan’s work on meaning-making in AI, since that series contains the philosophical arguments that anchor the rule (along with some non-obvious implications). But I suggest that you test this rule immediately in your own AI-use. Are there exceptions? When are you willing to delegate your subjective value judgments to an AI, and when are you not?
If you believe this rule makes sense, how do you teach your kids or your parents about this rule? How might you introduce it to your workplace?
I look forward to your reports of use.
Acknowledgements: special thanks to Vaughn Tan, whose work underpins all of this, Roger Williams, who was the first to alert me to the practical applications of Vaughn’s ideas, and Brian Lui, for letting me publish his creative writing improvement method. Thanks also goes to Tim Wilson, Crystal Widjaja, and Bennett Clement, who read and commented on drafts of this piece.
1. This is actually the main, load-bearing argument. We grant rights to meaning-making entities such as humans and animals. We do not currently afford AIs those same rights. So therefore, as a society, we do not treat AIs as meaning-making entities. ↩︎