The Deity Problem
The best way to think about AI today is that you are a deity to it. It is trying to serve you, but it cannot read your mind. It can only observe what you tell it and what you show it. The question is not whether AI will perfectly understand your preferences - it will not. The question is how it gathers evidence about what you actually want.
Why “Deity”
Think about the structural relationship between you and your AI system. You have goals, preferences, and a utility function that determines what “good” means. The AI cannot observe any of this directly. It can only watch your behavior, listen to your instructions, and ask you questions - then try to infer what you actually want.
This is the same dynamic that exists between a deity and a follower in every religious tradition. The follower cannot read the deity's mind. They can only study what was said, observe what happens, and occasionally ask for guidance. The parallel is not doctrinal - it is structural. It happens to be a useful mental model because everyone already has an intuition for it.
The standard approach to AI alignment tries to hardcode your utility function at design time. This fails for three reasons: you cannot articulate what you want upfront (preferences are latent), your priorities drift over time, and complete specification would take months and still be wrong. The Deity Problem treats your utility function as a latent variable to be estimated through interaction, not specified through documentation.
The Belief Model
The agent maintains a posterior belief over your utility function:
This starts as a diffuse prior - the agent does not know what you want. It sharpens as evidence accumulates through three channels. Early on, uncertainty is large and behavior is naturally exploratory. Over time, the posterior tightens and the agent confidently pursues what it has learned you want.
Channel 1: Structured Elicitation
Design experiments that directly probe your preferences with maximum information per question.
The agent designs pairwise comparisons, best-worst scaling, or adaptive choice-based conjoint. It uses D-optimal experiment design - choosing the pair that maximizes expected information gain about your preference weights.
Channel 2: Revealed Preference
Watch what the operator actually does and infer preferences from behavior, not from what they say.
Based on Afriat's theorem: if observed choices satisfy the Generalized Axiom of Revealed Preference (GARP), there exists a utility function that rationalizes all of them. The agent logs every decision you make or approve, models them as choices from feasible sets, and runs consistency checks.
This channel captures what you actually value, not what you say you value. It is the cheapest evidence channel because you are doing what you would do anyway - the agent just watches.
Channel 3: Direct Query
Ask the operator a natural-language question - but only when the expected improvement in decision quality exceeds the cost of their attention.
An agent that asks too many questions is not diligent - it is just poorly calibrated.
The agent simulates: “If I ask this question and get each possible answer, how much better would my policy be?” If the answer is “not much,” do not ask.
Channel Selection
| Criterion | Structured Elicitation | Revealed Preference | Direct Query |
|---|---|---|---|
| Info per query | Highest | Lowest | Medium |
| Operator cost | Medium | Zero (passive) | High (interrupt) |
| Precision | High (controlled) | Low (noisy behavioral) | Medium (depends on translation) |
| Best for | Discovering new preferences | Confirming behavioral patterns | Resolving specific ambiguities |
Drift Detection
Preferences are not static. You change your mind, priorities shift, new constraints emerge. The agent must detect when its model of your preferences has gone stale. This is done via posterior predictive checking - the agent periodically audits its own predictions against your actual decisions.
In the deity analogy: the agent notices that you changed your mind. The response is proportional to the magnitude of the change - a small drift gets gentle re-calibration, a large one triggers a full re-elicitation across all preference dimensions.
Worked Example: Home Automation Agent
20 pairwise comparisons: warm vs cool lighting, high vs low automation, jazz vs ambient. Result: estimated part-worths for lighting warmth, temperature, music genre, automation level.
Operator overrides thermostat to 74F repeatedly - revealed preference for warmer. Skips jazz on weekdays - contextual preference. Turns lights to max when cooking - activity-dependent. GARP check: needs a context variable (activity type) to rationalize.
High uncertainty on “guest mode” dimension. VOI > attention cost. Agent asks: “When friends come over, should I switch to guest mode automatically?” Answer: “Auto is fine, but do not change the music.” Update: automation_level = high for environment, low for entertainment.
Drift score rises to 0.22. Investigation: a baby was born. Preferences shifted - quieter music, warmer temperature, dimmer lights in the evening. Structural break detected. Agent widens prior, runs new elicitation session on changed dimensions. Re-converges within a week.
Properties
The Shorthand
For quick reference, the three channels and drift detector have religious nicknames that map to the deity analogy. These are mnemonics, not the primary terminology:
Connection to Other Frameworks
The Performance Frontier - your utility function U* is itself a frontier the agent navigates toward. The three evidence channels are how it estimates the frontier's location.
Designed Convergence - The Deity Problem is mechanism design where the mechanism is preference learning. The system converges to your true preferences.
Quality Hillclimb - the quality gates can be informed by the learned preference model. What counts as “good” is defined by the posterior over U*.