Stop Bragging About Tokens
Meta's 60 trillion, my 800M, and why neither number means anything on its own.
I need to confess something, I bragged about absurd amounts of tokens. Guilty.
In April 2026 Meta ran an internal leaderboard called “Claudeonomics” where 85,000 Meta employees competed on token consumption. In 30 days they burned through 60 trillion tokens, a number that means nothing without context. At Anthropic’s public Opus pricing, a 50/50 input-output blend comes out to roughly $15 per million tokens. That would imply around $900 million a month, or $10.8 billion annualized, about 5.4% of Meta’s 2025 revenue. Call it a rough order of magnitude. The Information reported it on April 6, 2026. Two days after the story broke, the dashboard went down. Meta later said the employee who built it took it down “at their discretion.” That disappearance is the story.
A raw token leaderboard measures who burns the most tokens, which isn’t the same as who ships the most value for the company. When the number on the dashboard is tokens, the top of the leaderboard is the engineer who never thought twice before running Claude on a whim, and the bottom is the engineer who thinks too hard about cost. Sure, tokens and productivity correlate up to a point, but at some scale the correlation breaks the same way lines of code broke as a proxy for programmer output once engineering developed higher abstraction coding languages.
Every organization using AI is sitting in a version of this right now. The bill is climbing faster than anyone expected, engineering is the early adopter, the rest of the organization is a couple quarters behind. The first instinct is to measure, and the measurement makes things worse, because the measurement has no denominator.
We need a denominator
If I told you a restaurant spent $20,000 on ingredients last month, you couldn’t tell me whether it had a great month or a disaster. Paired with meals served, the same number is a cost per plate. Paired with revenue, it’s food cost as a share of sales. Tokens on their own are the ingredient bill, and without a denominator the bill tells you nothing except that you spent money.
Tokens per PR merged, tokens per bug closed, tokens per support ticket resolved, tokens per qualified lead, tokens per feature shipped, tokens per paying user. Pick the metric the team already cares about, divide by it, and the number starts to say something. At Luzia we haven’t figured this out yet. I’m personally looking at tokens per PR shipped, tokens per feature closed, and even tokens per incremental retention. Some of the ratios are still too noisy to act on while others are pointing at obvious things we should have caught quarters ago. Trying to produce the ratio is what forces the question. A ratio you can argue about is more useful than a number you can only stare at.
Cap or don’t cap
Innovation and control pull in opposite directions here, and most people pick one and lose the other.
Cap usage and you kill the productivity gain. The engineers who figured out how to get 10x out of Claude did it by burning tokens on experiments, most of which went nowhere, and the few that went somewhere paid for the rest. Cap before you know which is which and you’re cutting the top of the curve while keeping the bottom.
Run it open and you’re writing million-dollar checks to Anthropic with no idea what came out the other side. That’s the Meta situation. Finance asks the question, engineering can’t answer it, and the conversation turns into a budget fight nobody wins.
You can spot the opposite mistake in founders bragging about tokens-per-employee in funding announcements, or, stranger still, in companies adding a token allowance to comp packages as if it were a perk. The tokens are spent on company work, so the “perk” is the company buying its own productivity and calling it a benefit. Both are the same leaderboard Meta ran, dressed up as marketing or HR. The theater works for one funding round. After that, investors start asking the tokens-per-what question, and the story collapses into the same conversation engineering had six months earlier.

You don’t cap, and you don’t budget per seat. You instrument. Six to twelve weeks of real use, logged against the right denominators, and by the end of it you know which teams convert tokens into outcomes and which teams convert tokens into nothing. This is not sophisticated. Tag the call or use separate API keys, name the team, point at the artifact where one exists, and let the ratios accumulate for a couple of months. Then you budget by outcome, not by seat. An engineering org that ships features gets more tokens. An engineering org that runs experiments nobody uses gets a conversation.
This only works if you start before the bill forces you to. Start after, and the budget fight is already happening, and nobody’s in the mood to be patient for three months.
2015, again
In 2015 every company had an AWS bill nobody could explain. Engineering owned the decisions, finance saw the number, and the number kept doubling. The conversation was the same one happening today, some version of why is this so high, why is nobody managing it, do we need to cap. A whole industry spawned around the problem, AWS implementation partners, cost-visibility vendors like Cloudability and Apptio, the Cloud FinOps Foundation, and the dedicated cloud architect role inside every mid-size company. The answer is what we now call FinOps. Instrument everything, tag every workload, assign ownership, report by team and by product. Five years of painful organizational change, and by 2020 cloud gross margin was a number every SaaS board tracked alongside revenue growth.
Tokens are the same arc on a faster clock. Eighteen months, maybe less. The companies that build the muscle now will be able to answer the board question when it comes. The ones that don’t will be writing down their AI P&L in 2027, wondering how it got away from them.
The unit is settling
Within 24 months, token margin is a number every board asks for by default, the way cloud gross margin became one by 2018. The companies that can answer “tokens per what” will optimize their spend and compound. The ones that can’t will be surprised by their own P&L, and the surprise will show up in the bill long before it shows up in the product.
A few patterns show up every time I watch a company try to build the muscle. The number survives only when finance and engineering co-own it, because whichever function holds it alone pulls it in its own direction. Finance looks at tokens without grasping the meaning. Engineering looks at tokens without paying attention to the cost. And the denominator has to be a metric the team already lives by, because inventing “tokens per PR” for an org that doesn’t already care about shipped PRs is a dead ratio by quarter two.
This week Jensen Huang told Dwarkesh Patel: “The input is electrons, the output is tokens. In the middle is Nvidia.” For your company, input is tokens. Output is whatever customers pay for. You’re the middle.
Stop asking “how many tokens did we burn.” Start asking “tokens per what.” If you can’t answer that for at least one what that matters to your business, you’re not running AI, you’re just pushing Anthropic’s run rate up and to the right.

I think the end-state is a framework that compares AI spend against the next-best use of both money and time.
Not just: “Did this AI workflow create value?”
But: “Did this create more value than the same dollars and human hours would have created elsewhere?”
In other words:
ROI on AI spend / ROI on alternative spend > 1
+
time saved or output gained / opportunity cost of the time redirected > 1
If both are true, increase spend. If not, it's just performative productivity.