I Have This Problem
Find the right framework, tool, or vocabulary term for your AI deployment challenge. Each answer links to the full concept with worked examples and the math behind it.
“We keep shipping improvements and then regressing. How do I make sure quality only goes up?”
A CI-enforced floor that only moves up. Each improvement becomes the new minimum.
“How do I make AI agents improve over time without writing an improvement plan?”
Ratcheted quality gates on stochastic output create emergent ascent. The agent does not need a plan.
“We deployed AI but we are scared to remove the human. How do we safely give it more independence?”
A 3-state progression: Disabled, HITL, Autonomous. Promote on statistical evidence, roll back on drift.
“Nobody agrees what "good" looks like for this task. How do I define quality?”
Map the distribution of human performance, find the 99th percentile, compute the gradient toward it.
“My AI keeps doing things I don't want. I can't write a complete spec of what I want.”
Three evidence channels: structured elicitation (conjoint), revealed preference (behavioral observation), and direct query (ask when VOI exceeds attention cost).
“Should I automate this specific task? How do I decide?”
Score across 9 dimensions. If any single dimension is a landmine (1), it is a hard no.
“Is this task a good candidate for AI? What is the ROI?”
T = time_to_do / time_to_check. High T = AI creates leverage. Low T = you are doing the work twice.
“The AI output looks good but I don't know if it is correct. Checking takes as long as doing it.”
Easy to generate, hard to verify. T approaches 1. You have added a step without saving time.
“How do I brainstorm business ideas that actually match real demand?”
Fix demand (immutable), vary means (mutable). Demand is a hidden force on your optimization gradient.
“How do I guarantee this initiative succeeds instead of hoping?”
Design the game so rational agents converge to your outcome. Finite state + Bayesian search + ratchet = theorem.
“What are the actual dollar costs of AI being wrong?”
Replace accuracy with dollars. Optimal threshold: theta* = C_FP / (C_FP + C_FN).
“The task is in a bad quadrant. What capital investment moves it to a better one?”
Five moves: build a verifier, decompose, enrich inputs, constrain outputs, build a rubric.
“Where is value leaking in my business that nobody has named?”
Your chart of accounts is a directed graph. Walk the edges. The soft spots are where value leaks.
“I spend all my time in meetings and firefighting. How do I invest in systems?”
Compile time: building systems with multiplicative ROI. Runtime: executing tasks with single-period returns.
“Is this AI automation a wasting asset or a compounder?”
Models depreciate, data appreciates. The net rate determines the investment type. See also: knowledge-capital framework.
“What is the NPV of automating this task? Should I build, buy, or hire?”
Calculate NPV, IRR, and payback period. Compare to hiring, SaaS, or doing nothing. Same math, different asset class.
“How should I think about AI output I cannot observe directly? How does the agent learn what I want?”
The four evidence channels from The Deity Problem: structured-elicitation (conjoint), revealed-preference (behavioral observation), direct-query (ask when worth it), and drift-detector (posterior predictive check). See also: oracle-gradient, designers-seat.
“How do I infer what the operator actually wants without asking? How do I learn from behavior instead of instructions?”
Watch what the operator does, not what they say. Revealed preference theory (Afriat, GARP) applied to AI alignment. Cheapest evidence channel because the operator behaves naturally.
“When should the AI ask the operator a question? How do I avoid over-asking or under-asking?”
Ask only when the expected value of the answer exceeds the cost of the operator's attention. An agent that asks too many questions is poorly calibrated, not diligent.
“How do I detect when the operator's preferences have changed and the model is stale?”
A posterior predictive check on recent decisions. When the fraction the model predicted incorrectly exceeds a threshold, trigger re-elicitation. Preferences drift - the system must detect it.
“What does "operational alpha" mean? How is it different from just doing a good job?”
Excess return on enterprise value generated through systematic identification of mispriced edges. The directed graph finds them. The tools evaluate them.
“What is the AI Sweet Spot? When does AI create the most value?”
Hard to do, easy to check. T >> 1. The templeton-ratio measures the gap. High ratio = transformative ROI. Also: the proof-layer and gold-standard define what "correct" means.
“What should I build first - the AI system or the verification instrument?”
Build the rubric first. The gold-standard IS the verification instrument. Without it, you are measuring with a broken ruler. See: soft-spot for finding where to look.
“How do I manage the pull of real demand on my product trajectory?”
The inescapable pull of real demand on product trajectories. Map it or crash into it.
“Should I be playing the game or designing it? What is the CTO's real job?”
Design the game so self-interested agents produce the outcome you want. Most engineering is game-playing. Mechanism design is game-designing.
“The AI deployment is at autonomy-state-machine HITL state. When do I promote to runtime autonomous?”
Consecutive batches below acceptance threshold. Not one good batch - N consecutive. The construction-spread is the gap between build cost and operational value.
“How do I measure what I should invest in next for my AI operations portfolio?”
Knowledge work either compounds or depreciates. Invest in the appreciating side: verifiers, data, rubrics. Not the depreciating side: models.
The Three Layers
Frameworks tell you where to look - strategic models for finding enterprise value.
Tools tell you how to evaluate - interactive calculators for specific decisions.
Lexicon gives you the language - vocabulary that travels in meetings you are not in.