Business Finance

Goodhart's Law

Risk & Decision ScienceDifficulty: ★★☆☆☆

Goodhart's Law applies: when a measure becomes a target, it ceases to be a good measure

Related Concepts in Other Trees

Mechanism design is the formal discipline that addresses Goodhart's Law: how to design games (incentive structures, scoring rules, auctions) so that when agents optimize strategically, the resulting equilibrium still achieves the designer's intended outcome (incentive compatibility)

Credit ScorePersonal Finance

Credit scores are the canonical personal-finance Goodhart example: designed to measure creditworthiness, but once consumers learned the component weights (utilization 30%, history 35%), they game the score directly (authorized-user tradelines, strategic card opens) without improving actual creditworthiness

Prerequisites (1)

incentiveslvl 1

You just rolled out a $500/month bonus for every support rep who keeps their CSAT score above 90%. Within two quarters, average CSAT jumps from 82% to 94% - but your Churn Rate quietly climbs from 3% to 5%. The number you targeted got better. The thing you actually cared about got worse.

TL;DR:

Goodhart's Law says that once you turn a measurement into a target - especially one attached to incentives - people optimize for hitting the number rather than delivering the outcome the number was supposed to reflect. Operators must design measurement systems that resist this distortion or accept that every metric they reward will eventually degrade as a signal.

What It Is

Goodhart's Law is a single, brutal observation: when a measure becomes a target, it ceases to be a good measure.

This isn't about dishonesty. It's about rational behavior. You learned in incentives that the incentive structure determines what people actually do. Goodhart's Law is the specific failure mode that emerges when you attach incentives to a measurement: people find the cheapest path to move the number, whether or not that path delivers the underlying Value Creation you wanted.

The measure was originally useful because nobody was trying to manipulate it. It passively reflected something real. The moment you make it a target - tie Commissions to it, set Hiring Targets around it, gate promotions on it - you've changed the game. The measure now reflects effort spent optimizing the measure, not the thing it used to track.

Why Operators Care

Every P&L is managed through numbers. You can't run Operations at scale by inspecting every transaction yourself - you pick measurements, set targets, and allocate resources based on what the numbers tell you. Goodhart's Law says this entire management layer is vulnerable to silent corruption.

The P&L impact shows up in three ways:

1)Misallocated spend. If your measurements are distorted, your resource allocation decisions are built on lies. You'll pour Marketing Spend into a channel that looks productive but isn't. You'll staff up a team whose Throughput numbers are inflated.

2)Hidden Churn. A metric looks healthy while the underlying Value Creation degrades. Customers leave, but your dashboard doesn't register the damage until it compounds into a Revenue miss.

3)Broken Feedback Loops. Good Operators use Feedback Loops to learn what's working. Goodhart-corrupted metrics poison the loop - you get false positive signals, double down on the wrong things, and the real problems stay invisible.

How It Works

The mechanics follow a predictable sequence:

Step 1: You pick a measure that correlates with the outcome you want. Close Rate correlates with sales effectiveness. defect rate correlates with engineering quality. CSAT correlates with customer happiness.

Step 2: You attach incentives to the measure. Bonuses, quotas, Hiring Targets, public dashboards, performance reviews - anything that makes hitting the number more attractive than missing it.

Step 3: People find ways to optimize the measure that don't require optimizing the outcome. This is the key insight. There are almost always cheaper ways to move a number than doing the hard underlying work:

•Sales reps targeted on Close Rate stop pursuing large, uncertain deals. The rate improves but Pipeline Volume and Revenue shrink.
•Engineers targeted on defect rate reclassify bugs as "feature requests." The defect count drops but product quality doesn't change.
•Support reps targeted on CSAT cherry-pick easy tickets and dodge complex ones. Scores rise but hard problems go unresolved, driving Churn.
•Recruiters targeted on Time-to-Fill push mediocre candidates through faster rather than finding better ones.

Step 4: The measure diverges from the outcome. You now have a number that looks great and a reality that's getting worse. And because you're managing by the number, you don't see the divergence until something breaks visibly - a Churn spike, a Revenue miss, a quality collapse.

The underlying math is straightforward. Every measure captures some dimensions of the outcome and misses others. When you incentivize the measure, rational actors exploit the gap between what the measure captures and what you actually want. The wider that gap, the worse the distortion.

When to Use It

Apply Goodhart's Law as a diagnostic lens in these situations:

When designing any incentive structure. Before you tie Commissions, bonuses, or targets to a metric, ask: How would a smart, rational person hit this number without delivering the outcome I actually want? If you can think of a way, so will they.

When a metric improves faster than expected. Sudden jumps in CSAT, Close Rate, Throughput, or defect rate should trigger suspicion, not celebration. Real operational improvement is usually gradual. Sharp moves often mean the measurement is being gamed, not that the underlying process improved.

When building a Scoring Model or Quality Gates. Any time you reduce a complex outcome to a number and gate decisions on that number, Goodhart applies. The more consequential the gate, the stronger the incentive to game it.

When two metrics diverge. If CSAT is up but Churn Rate is also up, one of those signals is lying. Goodhart's Law tells you which one to suspect: whichever one is being targeted.

Countermeasures:

•Measure multiple dimensions of the same outcome (harder to game all of them at once)
•Rotate which metrics are targeted so optimization pressure doesn't concentrate
•Pair every efficiency metric with a quality metric (Close Rate paired with Expansion Revenue, for example)
•Use auditing and Spot-Check processes on the measurement itself, not just the process being measured

Worked Examples (2)

Support team bonus drives CSAT up and customers out

You manage a support team of 8 reps handling 4,000 tickets/month. ARR is $2.4M (MRR of $200K). Current CSAT is 82%, Churn Rate is 3%/month ($6K/month in lost Revenue). You introduce a $500/month bonus for any rep maintaining CSAT above 90%.

Bonus cost: up to 8 × $500 = $4,000/month. At 3% monthly Churn Rate on $200K MRR, you're losing $6K/month in Revenue. You're hoping better CSAT reduces that Churn.
Within 3 months, average CSAT hits 94%. Reps have learned to prioritize quick-resolution tickets (password resets, billing questions) and route complex technical issues to a backlog that never gets worked. The measure moved - but the behavior change was cherry-picking, not better service.
Complex tickets (roughly 15% of volume, or 600/month) now average 9-day resolution instead of 3-day. These are your highest-value accounts. Churn Rate among this segment doubles.
Churn Rate overall rises from 3% to 5%. Monthly Revenue loss goes from $6K (3% × $200K) to $10K (5% × $200K) - an incremental $4K/month in lost Revenue. You're paying $4K/month in bonuses to lose an additional $4K/month in Revenue. Dollar for dollar, every bonus dollar is destroying a Revenue dollar.
Over a year: $48K in bonuses plus $48K in incremental Churn losses - $96K in total damage on a $2.4M ARR business (4% of ARR). And the CSAT dashboard is giving you a false positive signal the entire time.

Insight: The CSAT metric was a decent passive signal before you weaponized it. Once it carried a financial incentive, reps optimized for the measurement (easy tickets) rather than the outcome (reducing Churn). Pairing CSAT with a Churn Rate target - or with a Spot-Check audit of ticket complexity - would have caught this.

Close Rate target that shrinks the Pipeline

Your sales team of 5 reps generates $500K/month in Revenue from a Pipeline Volume of $2M at a 25% Close Rate. You set a new target: get Close Rate to 35% to improve sales efficiency. Reps who hit 35% get accelerated Commissions (1.5x rate).

Reps immediately stop pursuing uncertain large deals. A $200K deal with 20% probability has an Expected Value of $40K. Three $30K deals at 60% probability have a combined Expected Value of $54K. The small deals win on immediate Expected Value. But the large deal carries Expansion Revenue potential - large accounts typically grow 2-3x over their Lifetime Value, while small accounts stay flat. Reps rationally pick the path that lifts Close Rate, sacrificing long-term Revenue for short-term ratio improvement.
After two quarters, Close Rate is 38%. Reps are earning accelerated Commissions. But Pipeline Volume dropped from $2M to $1.2M because reps are only working high-probability deals.
Revenue: $1.2M × 38% = $456K/month. Down from $500K/month. You're paying premium Commissions for less Revenue.
Worse: the large deals that reps stopped pursuing had 3x higher Lifetime Value due to Expansion Revenue. The Revenue mix shifted toward small, flat accounts with no growth trajectory.

Insight: Close Rate is a ratio. You can improve it by closing more (hard) or by shrinking the denominator (easy). Goodhart's Law predicts people will take the easy path. Pairing Close Rate with Pipeline Volume or total Revenue would force reps to keep the Pipeline full while also closing well.

Key Takeaways

✓
Every measure you target will be optimized on its own terms, not on the outcome's terms. Ask 'how would a smart person game this?' before you attach incentives.
✓
Sudden metric improvements deserve suspicion. Real operational gains are gradual; sharp moves usually mean the measurement is being gamed.
✓
The countermeasure is never a single better metric - it's multiple metrics that are hard to game simultaneously, combined with Spot-Check auditing of the measurement process itself.

Common Mistakes

✗
Assuming the fix is a better single metric. Operators who discover Goodhart's Law often try to find the 'right' metric that can't be gamed. No such metric exists. Any single number attached to incentives will be gamed. The solution is measurement portfolios, not measurement perfection.
✗
Blaming people for rational behavior. When reps game CSAT or Close Rate, the failure mode is in the system design, not in the people. They responded rationally to the incentive structure you created - which is exactly what the incentives lesson predicted. Punishing the gaming without fixing the structure just drives the gaming underground.

Practice

medium

You run a recruiting team. You set a Hiring Targets goal: fill 20 engineering roles this quarter, with a bonus pool if the team hits it. Three months later, you've filled 22 roles - but 6 of the new hires quit within 90 days. Diagnose the Goodhart failure: what did the team optimize for, what outcome did you actually want, and how would you redesign the incentive?

Hint: Think about what the recruiters would do differently if they were targeted on 90-day Churn Rate among new hires instead of fill count. Then think about whether that replacement metric would create its own Goodhart problem.

Show solution

The team optimized for fill count - the cheapest path was to push borderline candidates through rather than hold out for strong fits. The outcome you wanted was productive engineers who stay. Redesign: pair Hiring Targets with a 90-day Churn Rate among new hires and an Interview-to-Placement Ratio floor. But note that targeting the Churn Rate alone would create a new Goodhart problem - recruiters might slow hiring to avoid risky placements, tanking Time-to-Fill. You need the portfolio of measures, not a single replacement.

easy

Your SaaS product ($180K ARR) has a support Quality Gate: every ticket must receive a first response within 4 hours. You find that 96% of tickets meet this gate. But you also notice Churn Rate has increased from 2% to 4% over six months. List three specific ways the 4-hour response target could be causing the Churn increase through Goodhart distortion.

Hint: Think about what 'first response' actually means versus what 'solving the customer's problem' means. What is the cheapest possible action that clears the 4-hour gate?

Show solution

(1) Reps send a template acknowledgment ('We received your ticket and are looking into it') that clears the 4-hour gate without actually starting work - resolution time balloons behind the scenes. (2) Reps prioritize speed over accuracy, giving wrong or incomplete answers that generate follow-up tickets and frustrate customers. (3) Complex tickets get a fast first-touch but then sit in a queue for days because the incentive only rewards initial response, not resolution. All three move the 4-hour metric while degrading the actual customer experience that drives Churn.

hard

Design a measurement system for GTM Teams responsible for $50K/month in Marketing Spend. The CEO wants to target Pipeline Volume as the single measure of marketing effectiveness. Using Goodhart's Law, explain why this single target is dangerous, then propose a portfolio of 3-4 metrics that would be harder to game simultaneously. For each metric, explain what gaming behavior it blocks.

Hint: Think about the cheapest way to inflate Pipeline Volume. Then think about what dimensions of Pipeline quality matter for downstream Revenue. Each metric in your portfolio should block a different gaming strategy.

Show solution

Single target danger: the cheapest way to inflate Pipeline Volume is to fill the Pipeline with low-probability deals - every inbound inquiry gets logged regardless of fit. The team shifts Marketing Spend toward high-volume, low-intent channels. Pipeline Volume looks great; Close Rate collapses downstream. Portfolio: (1) Pipeline Volume - keeps volume pressure on, blocks the team from going passive. (2) Close Rate on marketing-sourced Pipeline - blocks low-quality Pipeline because bad deals won't close, forcing the team to care about deal quality at the source. (3) Cost Per Unit (Marketing Spend per dollar of Pipeline generated) - blocks the strategy of simply spending more to inflate volume; forces efficiency. (4) Revenue from marketing-sourced deals within 90 days - ties the entire system back to Value Creation and blocks the scenario where Pipeline converts to activity but never produces Revenue. Gaming all four simultaneously is much harder than gaming any one.

Connections

Direct consequence of incentives - the specific failure mode when an incentive structure rewards a measurement instead of the outcome. Connects downstream to Quality Gates (gamed when the gate score carries consequences), Scoring Models (optimized for the score rather than the underlying quality), and Feedback Loops (corrupted metrics poison the signal). Apply when designing Commissions structures, Hiring Targets, or any resource allocation tied to a targeted metric.

Disclaimer: This content is for educational and informational purposes only and does not constitute financial, investment, tax, or legal advice. It is not a recommendation to buy, sell, or hold any security or financial product. You should consult a qualified financial advisor, tax professional, or attorney before making financial decisions. Past performance is not indicative of future results. The author is not a registered investment advisor, broker-dealer, or financial planner.

← back to tree browse all →