How to Use AI Without Becoming Stupid

This essay will introduce a rule for AI use that should prevent a whole host of bad outcomes. More importantly, the rule is built on top of some smart philosophy, which means it is carefully selected to be invariant — that is, it shouldn’t change until we get to Artificial General Intelligence (AGI).

This resistance to change is useful! In a world of rapid AI development, building business policy around things that don’t change is worth its weight in gold.

I’ll introduce the rule, explain why it works, and then walk you through the various instantiations of the rule in action. This rule is not unique — only the synthesis is unique. It turns out that sophisticated users of AI have all stumbled onto the rule and actively apply it in their work; this is merely a formulation that is easy to remember and therefore easier to apply.

I’ve named this rule the Vaughn Tan Rule, because I’ve adapted it from some work the consultant, business professor and writer Vaughn Tan has done recently on AI-use. Vaughn is currently shaping these ideas for AI education at various institutions around the world. Needless to say, everything good about these ideas is attributable to him; all the mistakes here are mine alone.

The Vaughn Tan Rule goes like this:

Do NOT outsource your subjective value judgments to an AI, unless you have a good reason to, in which case make sure the reason is explicitly stated.

Let’s break it down:

“Do NOT outsource your subjective value judgments to an AI” — an AI cannot make subjective value judgments, because it is not (yet) an entity that can make meaning. This is a philosophical argument more than an empirical one, although we’re going to look at some empirical evidence as well. The Vaughn Tan Rule hinges on this clause; we’ll get into this in a bit.
“… unless you have a good reason to” — sometimes you do want to outsource your judgments to an AI. For instance, as humans we cannot read and index the entire web, so we rely on Google to do it for us. In this context we have outsourced our subjective value judgment of a ‘good’ or ‘bad’ web page to Google’s search algorithm — by necessity. Similarly, if you receive 500 emails a day, you might outsource your subjective value evaluation of each email to an AI, which will summarise and filter for you. In this case your use case can be justified: you are simply not able to deal with 500 emails a day.
“… in which case make sure the reason is explicitly stated” — if anything bad happens because you have delegated your meaning-making to an AI, make sure you can accept it. For instance “I have outsourced my subjective value evaluation of my emails to an AI because I can’t handle my email load, and I am willing to accept missing the occasional important email”. This might be an acceptable tradeoff you’re willing to make — depending on your specific context. Making your reason explicit is a way of ensuring that acceptance.

AIs Cannot Make Meaning (Yet)

The Vaughn Tan Rule starts from a practical observation: in order to come up with good AI tools, education, workflows, and use cases, we need a guiding policy for use. Vaughn’s observation is that we should start from something that humans can do right now, and that AIs cannot do at all — at least not for some time.

Today, the one thing that humans can do that AIs cannot do is meaning-making.

What is meaning-making? Vaughn’s chosen definition is that meaning-making is “any decision we make about the subjective value of a thing.”

Humans make meaning all the time — in business and in life. We:

Decide that something is good or bad. For instance: “diamonds are beautiful” or “blood diamonds are morally reprehensible and I do not want anything to do with them.”
Decide that something is (subjectively) worth doing or not. “It is worth it to get a college degree.” Or “expanding into China right now is the best opportunity we have for creating long term enterprise value.”
Decide that one thing is better than another, or how much they can be compared on the same scale. “I think Jiro Dreams of Sushi is a better film about ambition than Whiplash.” Or “I think Singapore is a better city to live in than New York” or “I don’t think you can compare Singapore with New York because one is a city state and the other is just a city, but if you must, I think Singapore is better than New York.”
Reject existing judgments about the value of things, if they are worth doing, how they should be ranked, and whether their values can be compared on the same scale. “I thought dating him was worth it but I no longer believe this” and “I disagree with society about their broad acceptance of a thing, and instead believe in the opposite, even though it might be politically costly for me to do so.” (E.g. fighting for the end of apartheid, or campaigning for women’s rights).

These four types of meaning-making underpin the philosophical argument for the rule. They are the four types of meaning-making that Vaughn recognises in his work.

Notice that capability at meaning-making is not a spectrum. Unlike other capabilities, AIs will not slowly ‘get better’ at meaning-making as their capabilities advance. Instead, we should expect some kind of phase shift: if we ever get to AGI, there should be a societal transition period where AIs are regarded as equivalent meaning-making entities and are afforded rights like humans, or animals.

(Or, maybe not — predictions are hard.)

Because we do not grant Large Language Models (LLMs) rights today, or regard them as equivalent responsibility-bearing entities, we should not outsource our subjective value judgments to AIs.^[1]

That argument should stand on its own, but Vaughn points out that even if we do grant LLMs rights, it is questionable — as in, literally, you should question this! — if you want to outsource your subjective value judgment to something else.

And finally, on a practical level: you are not currently able to say “this $500M loss was the fault of ChatGPT, because we did what it told us to do.” This is about as reasonable as saying “I went to our garage and did what the hammer told me to do” and you will be justly derided for it (or possibly fired, in the case of a $500M write-down).

The point of giving you this explanation is to reassure you that there is some theoretical basis behind the Vaughn Tan Rule, and that this theoretical basis will not change. It is one of the sublime ironies of our current moment that while cutting-edge LLMs are the crowning achievement of machine learning (as STEM a field as any), the Vaughn Tan Rule is derived from the humanities.

Commoncog is focused on pragmatic ideas for use in business and in life, so I’ll stop here. For more details on the theory, read Vaughn’s on-going series on the topic.

How Are People Already Using This Rule?

I claimed earlier that most sophisticated users of AI have stumbled onto a version of the Vaughn Tan Rule in their own AI use. I’ll start with some trivial examples from my own life, just to give you a flavour of how to use this, and then shift over to some successful real world examples.

Uploading a bunch of documents to ChatGPT and then asking it questions about the document set is not outsourcing subjective value judgment, so long as your questions reserve the bulk of the judging for yourself. At this point in 2025 this is regarded as an uncontroversially safe use of AI, and a productivity booster. As a recent example, the MIT NANDA State of AI in Business 2025 report made headlines for claiming that “95% of enterprise AI projects have failed.” I was not particularly interested in the 95% of projects that failed, nor was I interested in reading the complete report since much of it was thinly-veiled content marketing. But I was interested in the small handful of projects that the report regarded as successes. I uploaded the PDF to ChatGPT, with the instructions “Analyse the following report. The report says that 95% of enterprise AI deployments are failing. I am interested in the 5% of deployments that are not failing. Include page numbers in the report from which you draw extracts and print out quotes of the use cases and examples that are working in addition to summarising them.” I then used this to narrow down the pages to read from the report; there was relatively little meaning-making involved. You may see the result of that conversation here.
There is a pattern of AI use which I call “speech-to-text-to-LLM”. The basic idea is that you speak out loud and use a speech-to-text model to give you a raw transcript. Transcripts can be pretty terrible, though — when we speak we tend to dither, or repeat ourselves, or go off-tangent. But if you feed that transcript to an LLM with a prompt to rewrite the transcript into a clean first draft, you have something that you can use in lieu of typing. You can use tools like Superwhisper to set up this workflow so that everything happens with a single hotkey. (See this video for an advanced demonstration of what you may accomplish with this). This is a good use of the Vaughn Tan Rule: in this case you are merely using AI as a transcription tool and then a text-transformation tool. There are no subjective value judgments involved. I use this workflow a lot — in fact the vast majority of my posts in Commoncog’s member forums are written this way.
It is dumb to allow ChatGPT to swipe left or right for you on Tinder because the subjective value judgment of picking a potential partner should be yours alone. More importantly, your criterion for a romantic partner should evolve as you reflect (meaning-make!) on your own personal experiences.
I know folks who use ChatGPT for career strategy advice or business strategy advice. Whether or not this use case violates the Vaughn Tan Rule depends on how you use it. For instance, asking “I am considering career move X, give me examples of people who have done a similar career move and succeeded, and give me some examples of folks who have done so and failed” is ok because you’re using it as a research input into your own decision making. But if you go “I care about values A, B, and C; should I make career move X?” then you are violating the rule: there is no good reason to outsource your value judgments here, and you can push the AI whichever way you want.
Similarly, asking ChatGPT for dating advice of the form “should I continue dating so-and-so, who has done A, B, C to me recently” is a terrible idea — it will give you an arrow in its starfield, instead of meaning-making alongside you. If you want to continue using an AI in this way, you should ask something like “what are common arguments for and against dating a person who has done A, B, and C to me”. Otherwise, you are better off talking to (and meaning-making with) another human.
Using AI to write code does not actually violate the Vaughn Tan Rule. Recall that the rule has a clause: “do NOT outsource subjective value judgments to an AI, unless you have a good reason to, in which case make sure the reason is explicit.” So, for instance, if you use Claude Code to vibe code a throwaway prototype (or an internal tool that sits behind a tightly-controlled corporate firewall) this is ok: you are deciding to let the AI make subjective value judgments about structure and style and programming language framework because you do not care about those considerations; you merely want the completed tool. In this case you have a good reason and that reason is explicitly stated to yourself. However, if you are shipping something to production that results in millions of dollars of customer impact, you should not ship code generated by AI coding tools without evaluating the generated code first — this is common sense, and in fact already the commonly-accepted practice in many AI-augmented teams.
This flexible, dual-use nature of the rule is actually quite common. Let me give you another example. Asking AI to give you a reading list of books about a specific topic is acceptable if you use the escape clause of the Vaughn Tan Rule. For instance, my friend Sam asked ChatGPT to build him a reading list of six-to-seven books about bubbles and strategic inflection points in industries. He had no strict requirements, nor a refined subjective value judgment about the kinds of books he wanted to read — he just wanted something to occupy his time during his paternity leave. So he accepted ChatGPT’s list at face value. This makes use of the “… unless you have a good reason to, in which case make sure the reason is explicit” clause in the rule. If he had more specific requirements, or strong opinions about the relative merits of books he wanted to read during this period, then using ChatGPT for a direct recommendation is not a good idea; it would be better to use it as an input for suggestions into his list-making process. In other words, let ChatGPT generate the suggestions; Sam would do the subjective value judgments for inclusion into his final list.

Hopefully this gives you an idea for how to use the rule in action. To drive this home, here are several examples of the Vaughn Tan Rule in action from across the world:

Education: Essay Marking and Grading

English teachers spend a large amount of time grading essays and providing students with written feedback. Daisy Christodoulou is Director of Education at No More Marking, a provider of online comparative judgment software for schools. In a podcast interview in March earlier this year, she summarised what she’s found from more than two years of LLM experimentation with real world school grading systems.

At first, Christodoulou’s team thought that LLMs could help with teacher marking directly. That is, give the LLM a student essay, and then have the LLM grade the essay and write the feedback. But of course LLMs were horrible at this.

Then, Christodoulou’s team used an insight they already had: that humans were also quite bad at direct evaluation. It turns out the gold standard for grading is ‘comparative judgment’: show two essays side by side and ask a human which essay is better, and why. No More Marking had a corpus of 40 million human judgments doing comparative marking. So Christodoulou decided to test this with LLMs as well. And it turned out that LLMs were slightly better at doing comparative evaluation as compared to direct evaluation, but they started making mistakes that humans just didn’t make with comparative judging. As an example, they found that LLMs tend to pick the left-most essay, for certain pairings of essays.

So the next thing they did was to feed each piece of writing to the LLM to be comparatively judged twice, each way around. When they did that, they learnt that the LLM was mostly making these sort of “prefer the left-most essay” errors on essay pairs that were quite close to each in quality. This was a relief. It meant that they could set up the grading system such that essay pairs that are close in quality would go to a human, which would still cut down a teacher’s workload by a fair amount.

What should these teachers do with their free time? Christodoulou’s team wanted teachers to spend more time on feedback, which is actually what helps students improve.

Notice that this is consistent with the clause in the Vaughn Tan Rule: Christodoulou’s team was outsourcing subjective value judgments in grading to an AI, with the caveat that the most difficult pairs were still sent to a human grader. But they were making this tradeoff because they believed it was worth it: essay assessment is only part of the goal of marking; if they could do a ‘good enough’ job on this, they could free up the human teachers to give better feedback, which they judged to be more important.

So how did Christodoulou’s team approach the feedback task? Of course, the first thing they did was that they fed essays to an LLM and asked it for writing feedback. They quickly found that the LLM would produce these superficially impressive but ultimately very generic pieces of writing feedback. This was not helpful for student improvement. So it was unacceptable.

Next, they decided “ok, let’s have a human in the loop!” They fed the essays to an LLM and then let the teachers read the essay, read the AI-generated feedback, and edit the feedback before giving it to the student. Here, the teachers revolted. Every single teacher said “it takes more time to read the essay and the AI feedback and think about how to modify the AI feedback; by the time I’m done it would’ve been faster if I just read the essay and then wrote the feedback from scratch myself!”

This is known as the ‘Paradox of Automation’ — if you have a system that can get to 90-95% quality, getting a human operator to stay alert enough to spot the 5%-to-10% of errors is actually terribly hard; you would get better total system performance if you let the human do the entire task by themselves.

So what did they do next? Christodoulou’s team flipped the approach. First, they built a system where the teachers would be able to leave audio feedback whilst reading the essay. Then they set it up so that multiple teachers could leave audio comments on each piece. The important thing here was that teachers could be as harsh in their feedback as they liked, so that giving feedback was easier for them; they didn’t have to spend time softening their tone.

The system would then use AI to:

Transcribe the feedback into written form, and then
Rewrite it to soften the tone based on a system-level prompt, and then
Summarise the transcribed feedback for the student to call out common themes across multiple teachers who read the piece.
And finally, most importantly, summarise across all the students, so that the teacher would know what issues to focus on in class!

This approach works on at least two levels: first, teachers would be able to read — in their high-level report — “20 out of 35 students in your class had problems with tenses” and would then be able to change their lesson plans in response to that.

Second, students feel ‘seen’ when receiving human feedback. Christodoulou says:

… what people have loved about the initial prototype for this version — and we have got some incredible feedback on it — is that you can actually say to the student, ‘This feedback you’re getting, this paragraph, it’s from four or five different teachers in the school.’ And what we’ve found as well is the best kind of audio feedback is when the teacher picks out something — a nice word or a phrase — in the student’s writing and says, ‘I love it when you say X.’ And if a couple of teachers do that, that will feature in the feedback the students get.

So, our initial findings are that the thing people like the most about it and the thing where I will say written feedback does offer something is the feeling of the work being seen and the student feeling seen and the student feeling motivated to want to do their best because a teacher who they know and respect is reading it and paying it attention. So, we think where we are now, we’ve alighted on something that kind of is working, but it is early days and we ourselves need to gather more feedback.

This is as perfect an example of shared meaning-making as one can get.

In fact, notice how the system maximised the amount of teacher bandwidth spent on meaning-making activities … by using AI to offload as much non-meaning-making work as possible!

A direct consequence of the Vaughn Tan rule is this: AI systems do best when designed to leave the meaning-making bits to the humans.

Using LLMs to Self-Teach Fiction Writing

Compare the example above to the way ex-portfolio manager (and Commoncog member) Brian Lui self-taught himself fiction writing.

For this example to be credible, you would have to trust my judgment of writing skill. I can vouch for Lui’s improvement at fiction: I was familiar with his writing before he started on this hobby, and have read his latest fiction output after he self-taught himself with the help of LLMs. My assessment is that there is marked improvement.

Brian’s approach was this (posted in the Commoncog member forums after a bit of refinement):

A. ask the AI for its opinion (subjective value judgement!) but

B. if you disagree, you are ALWAYS right. The AI is for pointing out things you might have overlooked.

C. NEVER ask the AI to do your core skill.

So if I’m writing, I never ask the AI to write anything. I don’t even ask it to suggest better phrases. Instead, I ask it to point out words or phrases that are weak. Then I think of better words myself.

LLM-use in Cutting Edge Hedge Funds

Thomas Li is founder and CEO at Daloopa, a financial data provider to many of the world’s top hedge funds and investment banks. (He’s also a Commoncog member, but I digress). In a recent podcast interview, Li outlined the various ways he’s seen large hedge funds and investment banks integrate LLMs into their work. Li’s perspective is particularly valuable because Daloopa is a data input into these internal AI systems; he has a unique perch into this world.

Li says:

Li: Here is one really smart use case I’ve seen a lot [in these places] is, let's say you have a bunch of internal notes that you've written about a business and your internal understanding of a business, and you say, “Okay, I want to correlate my internal understanding of the business with every single earnings call that has been created in the last four quarters since I started covering this company. Are there any discrepancies?” AI is phenomenal at that. That’s a pretty common use case. [Though] most buy side firms don’t do that because they are not allowed to upload anything to ChatGPT …

Host: I haven’t thought about that use case! So would I … write my own thesis on a company — which I do all the time — upload it to ChatGPT, and then say: “is this thesis disproven by something that’s happened in the past six months?”

Li: Yeah generally you want to be more specific than that. So you would say, “Here is the latest earnings transcript. Here is the latest conference transcript they were presenting at Morgan Stanley TMT conference, for instance. Here’s the transcript for that. Here are my notes prior to these two events happening. Where are the inconsistencies? [In this case ChatGPT is acting] like a really smart black lining function, if you think about it. So that works.

And then, in response to a question about the differences in AI adoption between pod shops (large multi-strategy hedge funds) vs ‘traditional, sleepy long-only hedge funds’, Li says:

Li: I wouldn't categorise it that way [that pod shops are adopting this more than long-onlys], because what we’ve seen is that both the big pod shops and the big, long-onlys have really put their foot down and invested in AI. I think they’ve seen the benefits. To your point, it’s like [the emergence of] Google all over again. If you don’t use Google as a research analyst [today], you’re just not doing your job right. So I think there is no hesitancy between the pod shops and the long-onlys at saying: “We got it. We got to do something here. We got to figure out how to create applications and [understand] how it works.”

[Note: in a separate part of the interview Li notes that such firms are still a small subset of the firms in Wall Street; this is leading edge and not yet widespread. But he sees adoption across all sorts of actors in high finance. The main commonality across these groups is the fact that a principal has autonomy and access to a research budget.]

But you're absolutely right on the research budget front, there are firms with huge research budgets, and there are firms with much smaller research budgets. Generally big AUM firms, doesn’t matter if they’re pod or long-only or long-short, have much bigger research budgets, and the big discrepancy that we’re seeing is whether or not they build internal tools. So the thing about AI that is very different from a search engine is you will never build your own search engine, because the foundational algorithm of a search engine was always locked within the confines of Google and Microsoft. But [today] we have access to a ton of foundational models. I mean, we can literally pick a foundational model today, and three weeks later, find something that’s four times as good and half as expensive, and you can switch on a dime, right? So it’s almost like if Google just released their algorithm, and 20 other companies are releasing their search algorithms too. What that implies is we can all build search in the way we want to build search, we can build extremely fine tuned search, we can build internal search, we can build external search, we can build all sorts of search, right? So that’s what's going on with AI [right now]. The cost of building an internal AI capability now is super low because of what people like to call ‘ChatGPT wrappers’. A ‘ChatGPT wrapper’ is really just software that you build on top of a foundational model that OpenAI provides, or Anthropic provides, or one of these [other] guys provide. And what is very obvious now is there are a whole suite of buy-side firms that's willing and able to build these internal tools.

[…] What we are seeing the really big shops doing is that they’re building this. They’re saying the most difficult part of building this, historically, has always been the foundational model, the logic. But now someone has invented the logic for us. So all I need to do is feed it the information. Feeding it the information is a harder problem than you think, but it’s solvable because you can just leverage the fact that your analysts write the [research] notes. You can leverage the fact that you can purchase transcripts from data vendors. You can leverage the fact that there are vendors like Daloopa that have done the work of extracting all the data [from earnings reports] into a database. So you can say: “hey, I would like to compare how my estimates are relative to company historicals — am I getting more accurate over time, or am I getting less accurate over time?” So now this exercise is completely doable, and answerable in one question, assuming you have access to a foundational model, your analysts’s actual [financial] models, and fundamental [financial] data from Daloopa. Historically, this sort of work is a risk associate’s two week’s worth of just manually grinding through in Excel.

Notice how all of the examples Li cites — and the set-up for the systems these funds are building — do not outsource subjective value judgments to an AI.

Calendar Scheduling

Vaughn Tan tried to get an AI tool to schedule his meetings for him when he was in Chicago a few months back. The AI tool failed spectacularly. Calendar scheduling is one of those deceptively procedural tasks that seems like it would be trivially automatable, but actually demands a fair amount of meaning-making from humans in order to do a consistently good job of it.

For instance, a scheduler would have to know “I am meeting John, who is a dear friend I have not met for a decade, and I prioritise meeting him above nearly every other meeting during this period” and “but my boss has asked me to meet this client that is super important to the company, so that is probably the second highest priority because not doing so would have negative professional consequences for me … but I am willing to sacrifice this for a meeting with John” and also “it would be very nice if I could stuff in a meeting with Alec and Joey, but only if they are free to meet at a wine bar, since they are excellent wine people but otherwise meeting with them is not super important to me; the meet is only worth going if we get to try some new wines together.”

The Vaughn Tan Rule tells us that AI calendar scheduling tools work best for events with low meaning-making content (such as sales or work calls of roughly equal value, during the more predictable social context of one’s working hours). It tells us that it is probably best to defer to the human when scheduling events require large amounts of subjective value judgment.

AI-Augmented Programming

In ‘The 70% Problem: Hard truths about AI-assisted Coding’, notable programmer Addy Osmani lists a number of observations from two years of investigating AI-assisted development.

Osmani writes:

Here's the most counterintuitive thing I've discovered: AI tools help experienced developers more than beginners. This seems backward – shouldn't AI democratize coding?

The reality is that AI is like having a very eager junior developer on your team. They can write code quickly, but they need constant supervision and correction. The more you know, the better you can guide them.

This creates what I call the "knowledge paradox":

Seniors use AI to accelerate what they already know how to do

Juniors try to use AI to learn what to do

The results differ dramatically

I've watched senior engineers use AI to:

Rapidly prototype ideas they already understand

Generate basic implementations they can then refine

Explore alternative approaches to known problems

Automate routine coding tasks

Meanwhile, juniors often:

Accept incorrect or outdated solutions

Miss critical security and performance considerations

Struggle to debug AI-generated code

Build fragile systems they don't fully understand

Osmani’s piece comes from December 2024. It was popularised on The Pragmatic Engineer by Gergely Orosz in January 2025. In the months since, we’ve seen many more reports confirming the observations from his essay. And as mentioned previously, whether or not vibe coding is acceptable to you depends on the context in which you want to use the code. The more important or fraught the code, the more important human judgment becomes.

For another, concrete example I’d like to point to this post from Marty Cagan, the notable product leader and partner at Silicon Valley Product Group (SVPG):

Many of you know of the product discovery coach Teresa Torres. At SVPG, we have long considered her one of the very best discovery coaches in the world, and we strongly recommend her book and her courses to everyone that is serious about product.

Recently she had an unfortunate injury (she broke her ankle playing ice hockey), but in one of the best examples of turning lemons into lemonade, after a rough surgery, she decided to make good use of her forced immobility by learning how to build an AI product.

In this case, she wanted to create an intelligent product that could act as a coach to someone learning how to conduct effective customer interviews.

(…) please notice how she was not delegating the decisions or the thinking to the technology. In every case, the decisions were based on her product sense and knowledge of product discovery. This is how you want to utilise generative AI tools (emphasis added).

You may watch Torres’s video on YouTube here.

Wrapping Up

So what have I shown you? I’ve given you three things in this essay:

A simple, easy to use heuristic for AI adoption in your business and in your life …
That, thanks to some ongoing work by Vaughn Tan, is built on top of some philosophy that should not change …
And a list of practical demonstrations of that rule across a variety of domains, taken from across the world.

What should you do with this essay? You could follow up by reading Vaughn Tan’s work on meaning-making in AI, since that series contains the philosophical arguments that anchor the rule (along with some non-obvious implications). But I suggest that you test this rule immediately in your own AI-use. Are there exceptions? When are you willing to delegate your subjective value judgments to an AI, and when are you not?

If you believe this rule makes sense, how do you teach your kids or your parents about this rule? How might you introduce it to your workplace?

I look forward to your reports of use.

Acknowledgements: special thanks to Vaughn Tan, whose work underpins all of this, Roger Williams, who was the first to alert me to the practical applications of Vaughn’s ideas, and Brian Lui, for letting me publish his creative writing improvement method. Thanks also goes to Tim Wilson, Crystal Widjaja, and Bennett Clement, who read and commented on drafts of this piece.

Footnote

1. This is actually the main, load-bearing argument. We grant rights to meaning-making entities such as humans and animals. We do not currently afford AIs those same rights. So therefore, as a society, we do not treat AIs as meaning-making entities. ↩︎