The Promotion Protocol

Formally: Autonomy State Machine

AI earns autonomy through demonstrated performance, just like an employee. You do not deploy it or not deploy it. You promote it through a 3-state progression with statistical graduation criteria at each step - and you roll it back when performance degrades.

Three-state autonomy machine showing Disabled to HITL to Autonomous with promotion and rollback transitions

The Problem

Binary thinking about AI - “it works” vs. “it does not work” - misses the entire middle ground where AI assists but does not decide. Most teams are stuck at one extreme: either they refuse to deploy because they cannot guarantee perfection, or they deploy fully autonomous and pray.

The Promotion Protocol makes the middle ground operational.

The Three States

Disabled→HITL→Autonomous

State 1: Disabled

AI output is blocked from production entirely. Used for new deployments, post-incident recovery, or when error rates exceed the safety threshold. The system is off.

Exit criteria: system passes smoke tests and human review of a sample batch confirms baseline competence.

State 2: Human-in-the-Loop (HITL)

AI produces output. A human verifies every item before it reaches production. The human is the quality gate. This is where most AI deployments should live initially - and where many should stay permanently.

Promotion criteria: error rate below acceptance threshold for N consecutive samples. Not one good batch - N consecutive good batches.

State 3: Autonomous

AI output goes directly to production. Humans spot-check only. This state is earned, not assumed. And it is revocable.

Rollback triggers: error rate exceeds rollback threshold, distribution shift detected, or drift detector fires. Rollback goes to HITL, not Disabled (unless critical).

Promotion Criteria

Promotion is earned through statistical evidence, not vibes. The criteria are explicit and measurable:

# Promotion: HITL → Autonomous

error_rate < acceptance_threshold

for consecutive_batches >= graduation_window

across sample_size >= graduation_sample_size

# Example: <2% error for 500 consecutive items

# across 10 consecutive daily batches

The key word is consecutive. One good batch is noise. Ten consecutive good batches is signal. The graduation window prevents promotion on a lucky streak.

Rollback Triggers

Error spikeError rate exceeds the rollback threshold. Note: the rollback threshold should be higher than the promotion threshold (hysteresis) to prevent oscillation.

Distribution shiftThe input distribution has changed. The model was trained on one world and is operating in another. Even if error rates look fine on old metrics, the underlying data has moved.

DriftThe drift detector fires - operator preferences have shifted and the model is optimizing for the wrong objective.

Hysteresis

The promotion threshold and the rollback threshold must be different. If they are the same, the system oscillates: promote at 2% error, roll back at 2% error, promote again, roll back again.

# Hysteresis gap

promotion_threshold = 2% error

rollback_threshold = 5% error

# The AI must be excellent to earn autonomy (2%)

# but is only demoted when it is clearly degrading (5%)

# The gap prevents oscillation.

The Parameters

Parameter	Description	Example
acceptance_threshold	Max error rate for promotion	2%
graduation_window	Consecutive batches required	10 batches
graduation_sample_size	Total items in window	500 items
rollback_threshold	Error rate triggering demotion	5%
drift_threshold	Distribution shift sensitivity	KL > 0.1

Worked Example: Invoice Processing

Week 1: Disabled → HITL

New deployment. Smoke test on 50 sample invoices passes. System promoted to HITL. Every extraction is reviewed by an operator before posting to the ERP.

Weeks 2-5: HITL (accumulating evidence)

500 invoices processed. Error rate: 1.4% (below 2% threshold). Ten consecutive daily batches all below threshold. Graduation criteria met.

Week 6: Promoted to Autonomous

Extractions go directly to ERP. Operator reviews a 10% sample daily. Error rate holds at 1.2%.

Month 3: Rollback to HITL

New vendor with a non-standard invoice format. Distribution shift detected. Error rate spikes to 6.2%. System automatically rolled back to HITL. Operator verifies all output from the new vendor while the model is retrained.

Month 4: Re-promoted

Model retrained on new format. Error rate back to 1.1%. Ten consecutive batches pass. Re-promoted to Autonomous.

Why “Promote” Not “Deploy”

The metaphor matters. “Deploy” is binary - the system is deployed or it is not. “Promote” implies earned trust, demonstrated competence, and the possibility of demotion.

CTOs present this to boards. Board members understand promotions. “We promoted the AI from supervised to autonomous last quarter, then rolled it back when drift fired” is a sentence that parses for everyone in the room. “We adjusted the autonomy state machine transition parameters” is not.

Connection to Other Frameworks

Quality Hillclimb - the HITL review is a quality gate. The graduation criteria are the ratchet mechanism.

Dollarized Confusion Matrix - error costs determine the acceptance and rollback thresholds. The cost of a false positive vs. false negative sets the promotion criteria.

Designed Convergence - the Promotion Protocol is mechanism design for AI autonomy. The incentive structure ensures the system converges toward the right autonomy level.