← Frameworks

The Deity Problem

The best way to think about AI today is that you are a deity to it. It is trying to serve you, but it cannot read your mind. It can only observe what you tell it and what you show it. The question is not whether AI will perfectly understand your preferences - it will not. The question is how it gathers evidence about what you actually want.

Architecture diagram showing three evidence channels feeding into agent belief model with drift detection feedback loop

Why “Deity”

Think about the structural relationship between you and your AI system. You have goals, preferences, and a utility function that determines what “good” means. The AI cannot observe any of this directly. It can only watch your behavior, listen to your instructions, and ask you questions - then try to infer what you actually want.

This is the same dynamic that exists between a deity and a follower in every religious tradition. The follower cannot read the deity's mind. They can only study what was said, observe what happens, and occasionally ask for guidance. The parallel is not doctrinal - it is structural. It happens to be a useful mental model because everyone already has an intuition for it.

The standard approach to AI alignment tries to hardcode your utility function at design time. This fails for three reasons: you cannot articulate what you want upfront (preferences are latent), your priorities drift over time, and complete specification would take months and still be wrong. The Deity Problem treats your utility function as a latent variable to be estimated through interaction, not specified through documentation.

The Belief Model

The agent maintains a posterior belief over your utility function:

P(U | evidence_1, ..., evidence_n)

This starts as a diffuse prior - the agent does not know what you want. It sharpens as evidence accumulates through three channels. Early on, uncertainty is large and behavior is naturally exploratory. Over time, the posterior tightens and the agent confidently pursues what it has learned you want.

Channel 1: Structured Elicitation

Conjoint Analysis(the “sacrament”)

Design experiments that directly probe your preferences with maximum information per question.

The agent designs pairwise comparisons, best-worst scaling, or adaptive choice-based conjoint. It uses D-optimal experiment design - choosing the pair that maximizes expected information gain about your preference weights.

# Example elicitation
“Would you prefer:
(A) Ship feature X by Friday with known edge case bug
(B) Ship feature X by next Wednesday, fully tested”
# One comparison reveals the speed-vs-quality tradeoff
# in the operator's utility function.
Highest info per queryRequires operator attentionAgent controls the experiment

Channel 2: Revealed Preference

Behavioral Observation(the “scripture”)

Watch what the operator actually does and infer preferences from behavior, not from what they say.

Based on Afriat's theorem: if observed choices satisfy the Generalized Axiom of Revealed Preference (GARP), there exists a utility function that rationalizes all of them. The agent logs every decision you make or approve, models them as choices from feasible sets, and runs consistency checks.

This channel captures what you actually value, not what you say you value. It is the cheapest evidence channel because you are doing what you would do anyway - the agent just watches.

# Revealed preference in practice
Operator overrides thermostat to 74F three evenings in a row
# Revealed: prefers warmer than the model predicted
Operator skips jazz playlist 60% of weekday evenings
# Revealed: jazz preference is contextual, not universal
Cheapest per observationNoisiest signalPassive - zero operator cost

Channel 3: Direct Query

Ask When It Is Worth It(the “prayer”)

Ask the operator a natural-language question - but only when the expected improvement in decision quality exceeds the cost of their attention.

An agent that asks too many questions is not diligent - it is just poorly calibrated.

# The query threshold
VOI(question) = E[improvement in policy from answer]
c_query = cost of operator's attention
Only ask when: VOI(question) > c_query

The agent simulates: “If I ask this question and get each possible answer, how much better would my policy be?” If the answer is “not much,” do not ask.

Highest latencyResolves ambiguitiesExplicit attention cost

Channel Selection

CriterionStructured ElicitationRevealed PreferenceDirect Query
Info per queryHighestLowestMedium
Operator costMediumZero (passive)High (interrupt)
PrecisionHigh (controlled)Low (noisy behavioral)Medium (depends on translation)
Best forDiscovering new preferencesConfirming behavioral patternsResolving specific ambiguities

Drift Detection

Preferences are not static. You change your mind, priorities shift, new constraints emerge. The agent must detect when its model of your preferences has gone stale. This is done via posterior predictive checking - the agent periodically audits its own predictions against your actual decisions.

# Drift score (the “heresy detector”)
drift = (1/N) * sum(predicted_choice != actual_choice)
# Response levels:
drift < 0.1   Model tracking well. Continue observing.
drift 0.1-0.3 Some shift. Schedule an elicitation session.
drift > 0.3   Major shift. Widen posterior. Re-elicit.

In the deity analogy: the agent notices that you changed your mind. The response is proportional to the magnitude of the change - a small drift gets gentle re-calibration, a large one triggers a full re-elicitation across all preference dimensions.

Worked Example: Home Automation Agent

Structured elicitation (onboarding)

20 pairwise comparisons: warm vs cool lighting, high vs low automation, jazz vs ambient. Result: estimated part-worths for lighting warmth, temperature, music genre, automation level.

Revealed preference (ongoing)

Operator overrides thermostat to 74F repeatedly - revealed preference for warmer. Skips jazz on weekdays - contextual preference. Turns lights to max when cooking - activity-dependent. GARP check: needs a context variable (activity type) to rationalize.

Direct query (targeted)

High uncertainty on “guest mode” dimension. VOI > attention cost. Agent asks: “When friends come over, should I switch to guest mode automatically?” Answer: “Auto is fine, but do not change the music.” Update: automation_level = high for environment, low for entertainment.

Drift detection (month 3)

Drift score rises to 0.22. Investigation: a baby was born. Preferences shifted - quieter music, warmer temperature, dimmer lights in the evening. Structural break detected. Agent widens prior, runs new elicitation session on changed dimensions. Re-converges within a week.

Properties

No spec neededDeploy on day one with a diffuse prior. The agent learns through interaction, not months of requirements gathering.
Handles driftThe drift detector catches preference changes. The re-elicitation mechanism responds. The agent is always updating.
Explicit costEvery elicitation and query has a measurable cost. The agent optimizes the information-to-attention ratio. It interrupts only when it is worth it.
FalsifiableThe learned model predicts future choices. If predictions are wrong, the model is updated. No unfalsifiable “the model knows what you want.”
DecomposableEach preference dimension has its own posterior. Drift in one dimension does not contaminate others. Re-elicitation is targeted.

The Shorthand

For quick reference, the three channels and drift detector have religious nicknames that map to the deity analogy. These are mnemonics, not the primary terminology:

Sacrament = structured elicitation
Scripture = revealed preference
Prayer = direct query
Heresy = drift detection

Connection to Other Frameworks

The Performance Frontier - your utility function U* is itself a frontier the agent navigates toward. The three evidence channels are how it estimates the frontier's location.

Designed Convergence - The Deity Problem is mechanism design where the mechanism is preference learning. The system converges to your true preferences.

Quality Hillclimb - the quality gates can be informed by the learned preference model. What counts as “good” is defined by the posterior over U*.