Quality Hillclimb

Formally: Quality-SGD

You do not need to teach the agent how to improve. You need gates that reject bad output and a ratchet that locks in good output. The improvement is emergent. Apply deterministic quality gates to stochastic agent output and the system climbs a quality surface you never explicitly defined.

Quality ascent path with ratchet floors showing monotonically non-decreasing improvement

The Insight

AI agents produce stochastic output - given the same input, they generate different results each time. Most teams try to control this by writing better prompts, fine-tuning models, or adding instructions. This is playing the game.

Quality Hillclimb designs the game instead. The agent does not need a plan for improvement. It needs gates that create a one-way valve on quality. Output that passes the gate and exceeds the current best becomes the new floor. Output that fails is rejected. The agent tries again. Over many iterations, quality ascends.

How It Works

# The loop

agent produces output (stochastic)

→ gate evaluates (deterministic)

→ if score > current_best:

ratchet it in (new floor)

→ if score <= current_best:

reject, agent retries

# Each accepted output is a step uphill.

# The sequence of baselines is monotonically non-decreasing.

The Quality Ratchet

The Quality Ratchet is the primitive that makes hillclimbing possible. It is a CI-enforced floor that only moves up. Once a metric reaches a threshold, the system blocks any change that drops below it.

# Formal property

baseline_{{k+1}} >= baseline_k for all k

# The sequence is monotonically non-decreasing.

# No improvement is ever lost. No regression is ever permitted.

The ratchet is the mechanism that prevents downhill steps. Without it, the stochastic process is a random walk. With it, the stochastic process is a hillclimb.

Gate Design Principles

A good gate has three properties:

Quantitative and deterministic

The gate produces a number, and the same input always produces the same number. “Does it feel good?” is not a gate. “Does test coverage exceed 80%?” is a gate.

Cheap relative to generation

If the gate costs as much to evaluate as the agent costs to generate, you are back to T = 1 - the Verification Trap. The gate must be orders of magnitude cheaper than generation.

Correlated with actual quality

A gate that measures the wrong thing drives the wrong improvement. Goodhart's Law applies: when a measure becomes a target, it ceases to be a good measure. The gate must be validated against ground truth.

Why It Works: The SGD Analogy

For readers who want the formal connection: this is stochastic gradient descent on a quality loss surface, without ever computing the gradient explicitly.

# The analogy

Quality gates = gradient signal

Stochastic agent output = noise (exploration)

Quality ratchet = learning rate floor

The system performs SGD on a quality surface

without computing the gradient explicitly.

# The gates DEFINE the surface. The stochasticity EXPLORES it.

# The ratchet LOCKS IN each step uphill.

This is why it is called Quality-SGD in the formal treatment, and why it works even when you cannot write down the quality function analytically. The gates implicitly define it. The agent implicitly optimizes it.

Practical Example

AI agent writing code

Gate: tests pass + coverage >= current floor + linter clean + no security warnings.

Ratchet: each PR that passes and exceeds coverage becomes the new floor.

Result: over 100 PRs, coverage climbs from 60% to 85%. Lint violations drop from 200 to 12. Zero regression on any metric. No one wrote a “quality improvement plan.” The gates did the work.

Connection to Other Frameworks

Designed Convergence - Quality Hillclimb is the single-agent instance. The gates + ratchet are the conditions for convergence.

The Performance Frontier - the gates define what counts as “uphill.” The frontier is where the hillclimb is heading.

The Promotion Protocol - the autonomy graduation criteria are quality gates. The HITL state is a gate the AI must pass to earn autonomy.

Verification Quadrant - gates only work in the Sweet Spot where verification is cheap. The Templeton Ratio determines whether a gate is economically viable.