Karush-Kuhn-Tucker. Optimization with inequality constraints.
Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.
You already know the equality-constraint story: at an optimum, the objective’s gradient is a linear combination of constraint gradients. KKT conditions are the inequality-constraint version of that same geometric idea—plus one extra “switch” (complementary slackness) that decides which inequalities actually matter at the solution.
KKT conditions characterize optimal solutions of constrained problems with inequalities by combining (1) stationarity of the Lagrangian gradient, (2) primal feasibility, (3) dual feasibility (μᵢ ≥ 0), and (4) complementary slackness (μᵢ·gᵢ(x*) = 0). In convex problems (with mild constraint qualifications), KKT is necessary and sufficient, and it’s the gateway to duality and many algorithms.
In many real optimization problems, constraints are not equalities like ; they’re inequalities like “stay inside a region” or “don’t exceed a budget.” A standard form is:
You already know the equality-only case: if we had only , then at a solution we often get
That says: the objective gradient is “balanced” by constraint gradients.
With inequalities, some constraints matter only when they’re tight. If the constraint is loose (you are strictly inside the feasible region), it shouldn’t influence the optimum condition.
Example intuition: Suppose you must minimize a function over a disk. If the unconstrained minimizer lies inside the disk, then the boundary constraint shouldn’t push back—you just choose the unconstrained minimizer. But if the minimizer wants to leave the disk, the boundary becomes “active” and exerts a force.
KKT conditions encode exactly this:
This “either active or multiplier is zero” rule is complementary slackness.
For the inequality constraints , we introduce multipliers with the restriction .
The Lagrangian is
The sign restriction is not decoration—it is what prevents the Lagrangian from “rewarding” constraint violation. If (violated), then a nonnegative increases , penalizing violation.
At a candidate optimum (under suitable constraint qualifications), there exist multipliers and such that:
1) Stationarity
2) Primal feasibility
3) Dual feasibility
4) Complementary slackness
Take a moment to parse condition (4): a product equals zero, so for each inequality constraint , at least one must be zero:
In two dimensions, imagine sliding a contour line of until it first touches the feasible set. At a smooth touching point on an active constraint boundary, the contour is tangent to the boundary, so the objective gradient is parallel to the boundary normal.
Below is a simple ASCII-style diagram (conceptual, not to scale):
objective level set
(lower is better)
______
/ \
/ \
| x* | <-- first touching point
\ /
\________/
|
| ∇g(x*) (normal to boundary)
v
--------------------------- constraint boundary g(x)=0
feasible region is one side (g(x)≤0)KKT makes that “first touch” precise: is balanced by a nonnegative combination of active constraint normals.
<svg xmlns="http://www.w3.org/2000/svg" width="560" height="240" viewBox="0 0 560 240">
<rect width="560" height="240" fill="white"/>
<!-- Feasible half-plane (g(x) <= 0) -->
<polygon points="60,200 520,120 520,230 60,230" fill="#e8f3ff" stroke="none"/>
<!-- Boundary line g(x)=0 -->
<line x1="60" y1="200" x2="520" y2="120" stroke="#1f4e79" stroke-width="3"/>
<text x="420" y="110" font-family="Arial" font-size="14" fill="#1f4e79">g(x)=0 (active boundary)</text>
<!-- Level set ellipse -->
<ellipse cx="260" cy="145" rx="110" ry="55" fill="none" stroke="#444" stroke-width="2"/>
<text x="150" y="70" font-family="Arial" font-size="14" fill="#444">objective level set f(x)=c</text>
<!-- Touch point x* -->
<circle cx="340" cy="130" r="5" fill="#d62728"/>
<text x="350" y="135" font-family="Arial" font-size="14" fill="#d62728">x*</text>
<!-- Normal vector to boundary (∇g) -->
<defs>
<marker id="arrow" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#1f4e79"/>
</marker>
</defs>
<line x1="340" y1="130" x2="310" y2="70" stroke="#1f4e79" stroke-width="3" marker-end="url(#arrow)"/>
<text x="235" y="75" font-family="Arial" font-size="14" fill="#1f4e79">∇g(x*)</text>
<!-- Gradient of f (∇f) opposite direction -->
<line x1="340" y1="130" x2="370" y2="190" stroke="#444" stroke-width="3" marker-end="url(#arrow)"/>
<text x="380" y="195" font-family="Arial" font-size="14" fill="#444">∇f(x*)</text>
<!-- Feasible label -->
<text x="70" y="225" font-family="Arial" font-size="14" fill="#0b3d91">feasible region (g(x)≤0)</text>
</svg>Interpretation: at the touching point , stationarity becomes roughly with .
KKT conditions are:
With equality constraints, Lagrange multipliers say: at ,
That arises from the idea that you can’t move in any feasible direction without increasing .
For inequalities, define
Then stationarity is the same-looking condition:
Let’s expand it explicitly (this “show your work” step matters because it tells you what vectors are being balanced):
So stationarity at means
This is a vector equation in .
KKT always includes the obvious requirement:
It’s easy to overlook because it’s “just the constraints,” but it’s doing a lot of work: if you solve stationarity without feasibility, you can easily get points outside the feasible set.
For each inequality, KKT requires
Think of as a penalty weight on constraint violation . If you let be negative, then violating the constraint (making positive) could reduce the Lagrangian. That breaks the interpretation of as objective + penalties.
Geometrically, is a push in the direction of the constraint normal. A negative would pull you into the infeasible region.
One inequality constraint (for intuition):
If the optimum is interior (), then there’s no reason to have , because the constraint isn’t binding. KKT will force (via complementary slackness, coming next), and stationarity reduces to .
If the optimum lies on the boundary (), stationarity becomes
So is parallel to , with nonnegative scaling (). That is the inequality analog of the equality “parallel gradients” picture.
With several inequalities, you do not generally have parallel to a single constraint normal. Instead, you get a nonnegative combination:
Here is the active set: constraints with .
This means lies in the cone generated by the active constraint normals.
A quick two-constraint picture: if two boundaries meet at a corner, then the outward normals span a wedge-shaped cone. KKT says the objective gradient must point into that cone for the corner to be optimal.
KKT conditions are not magical; they’re guaranteed under mild regularity assumptions called constraint qualifications (CQs). A common one you’ll see in convex optimization is Slater’s condition:
You don’t need to memorize every CQ right now; the practical lesson is:
We’ll keep this in the background as we focus on the mechanics you’ll actually use: stationarity + feasibility + complementary slackness.
Equalities are always “active” because must hold exactly. Inequalities are different: they can be tight or loose.
KKT encodes that with:
This trio is worth reading as a logic rule:
This is why people describe as a “force”: it only pushes when you’re at the wall.
Define the active set at a feasible point :
Complementary slackness implies:
So stationarity effectively reduces to a smaller set of constraints:
This is the conceptual engine behind active-set methods: guess which constraints are active, solve the equality-constrained problem, then check consistency.
<svg xmlns="http://www.w3.org/2000/svg" width="560" height="210" viewBox="0 0 560 210">
<rect width="560" height="210" fill="white"/>
<text x="20" y="28" font-family="Arial" font-size="16" fill="#222">Complementary slackness: μᵢ gᵢ(x*) = 0</text>
<!-- Two columns: slack vs active -->
<rect x="30" y="50" width="240" height="130" rx="10" fill="#f2f2f2" stroke="#999"/>
<rect x="290" y="50" width="240" height="130" rx="10" fill="#e8f3ff" stroke="#1f4e79"/>
<text x="110" y="75" font-family="Arial" font-size="15" fill="#333">Slack constraint</text>
<text x="370" y="75" font-family="Arial" font-size="15" fill="#1f4e79">Active constraint</text>
<text x="55" y="105" font-family="Arial" font-size="14" fill="#333">gᵢ(x*) < 0</text>
<text x="55" y="130" font-family="Arial" font-size="14" fill="#333">⇒ μᵢ = 0</text>
<text x="55" y="155" font-family="Arial" font-size="14" fill="#333">No contribution to stationarity</text>
<text x="315" y="105" font-family="Arial" font-size="14" fill="#1f4e79">gᵢ(x*) = 0</text>
<text x="315" y="130" font-family="Arial" font-size="14" fill="#1f4e79">μᵢ ≥ 0 (often > 0)</text>
<text x="315" y="155" font-family="Arial" font-size="14" fill="#1f4e79">May contribute: μᵢ∇gᵢ(x*)</text>
<!-- Switch arrow -->
<defs>
<marker id="arrow2" viewBox="0 0 10 10" refX="10" refY="5" markerWidth="7" markerHeight="7" orient="auto">
<path d="M 0 0 L 10 5 L 0 10 z" fill="#555"/>
</marker>
</defs>
<line x1="270" y1="115" x2="290" y2="115" stroke="#555" stroke-width="3" marker-end="url(#arrow2)"/>
<text x="245" y="100" font-family="Arial" font-size="12" fill="#555">tighten</text>
<text x="245" y="130" font-family="Arial" font-size="12" fill="#555">to</text>
<text x="245" y="160" font-family="Arial" font-size="12" fill="#555">activate</text>
</svg>In many applications, is a resource limit (e.g., “cost − budget ≤ 0”). Then behaves like a shadow price: how much the optimal objective value would improve if you relaxed the constraint slightly.
If a constraint is slack, relaxing it doesn’t change the solution locally, so the shadow price is zero: .
If a constraint is binding, relaxing it can help, so the shadow price can be positive: .
For a candidate :
In convex problems (plus a CQ like Slater), if you find satisfying KKT, you are done: it’s globally optimal.
Still: in the convex setting you already know, KKT is one of the most powerful “solve-by-conditions” tools you can learn.
You’re about to unlock Lagrangian Duality. KKT conditions are the bridge between:
The central object is the Lagrangian:
For fixed multipliers, you can minimize over to get the dual function:
The dual then maximizes subject to .
KKT conditions are exactly the “meeting point” where:
In convex optimization, if Slater’s condition holds, then:
This turns KKT into a necessary and sufficient optimality characterization.
Many algorithms iteratively update both:
Complementary slackness becomes a target condition: at convergence, each inequality is either tight with positive multiplier, or slack with zero multiplier.
If your tech-tree interface has an interactive 2D canvas:
Prompt to answer while exploring: **“Is exactly opposite to ? What changes when the unconstrained minimizer moves inside the disk?”**
This visual experiment is the cleanest way to feel the difference between interior solutions () and boundary solutions ().
| Feature | Equality | Inequality |
|---|---|---|
| Multiplier sign | free (any real) | |
| Always active? | Yes | No (active only if ) |
| Extra condition | none beyond feasibility | complementary slackness |
| Geometry | gradient balance on manifold | gradient balance using active boundary normals |
As you move to duality, you’ll repeatedly use KKT to go from a constrained primal to a dual problem you can solve more easily—or to certify optimality of a proposed solution.
Solve
\n\n$$
\begin{aligned}
\min_{x \in \mathbb{R}} \quad & f(x) = (x-2)^2 \\
\text{s.t.} \quad & g(x)= -x \le 0 \;\;\; (\text{equivalently } x \ge 0)
\end{aligned}
Step 1: Write the Lagrangian.
\nWe have one inequality constraint with multiplier μ ≥ 0:
\n$$
\mathcal{L}(x,\mu) = (x-2)^2 + \mu(-x).
Step 2: Stationarity (derivative w.r.t. x equals 0).
\nCompute:
\n$$
\frac{\partial \mathcal{L}}{\partial x} = 2(x-2) - \mu.
Step 3: Primal feasibility.
\nConstraint is .
Step 4: Dual feasibility.
\nμ* ≥ 0.
Step 5: Complementary slackness.
\n$$
\mu^ g(x^) = \mu^(-x^) = 0.
Step 6: Case analysis.
\nCase A: μ* = 0.
Then stationarity gives 0 = 2(x-2) ⇒ x = 2.
Check feasibility: x* = 2 ≥ 0 (OK).
Check g(x) = -2 < 0, so constraint is slack, consistent with μ=0.
\nCase B: x* = 0.
Then stationarity gives μ = 2(0-2) = -4, which violates μ ≥ 0.
So Case B is impossible.
Step 7: Conclude solution.
\n$$
x^ = 2, \quad \mu^=0.
Insight: Because the unconstrained minimizer x=2 is already feasible (it lies in x≥0), the inequality constraint is slack at optimum, so complementary slackness forces μ*=0 and KKT reduces to the unconstrained condition ∇f=0.
Solve the constrained minimization (a projection problem):
\n$$
\begin{aligned}
\min_{\mathbf{x}\in\mathbb{R}^2} \quad & f(\mathbf{x}) = \tfrac12\lVert \mathbf{x} - \mathbf{a} \rVert^2 \\
\text{s.t.}\quad & g(\mathbf{x}) = \lVert \mathbf{x} \rVert^2 - 1 \le 0
\end{aligned}
Step 1: Build the Lagrangian.
\nWe have one inequality constraint with μ ≥ 0:
\n$$
\mathcal{L}(\mathbf{x},\mu)=\tfrac12\lVert \mathbf{x}-\mathbf{a}\rVert^2 + \mu(\lVert \mathbf{x}\rVert^2 - 1).
Step 2: Compute the gradient w.r.t. x and apply stationarity.
\nFirst expand gradients carefully:
\n- For , we have
So
\n$$
\nabla_{\mathbf{x}}\mathcal{L}(\mathbf{x},\mu) = (\mathbf{x}-\mathbf{a}) + \mu(2\mathbf{x}).
Thus
\n$$
\mathbf{x}^ = \frac{1}{1+2\mu^}\,\mathbf{a}.
Step 3: Primal feasibility.
\nConstraint: i.e. .
Step 4: Complementary slackness.
\n$$
\mu^(\lVert\mathbf{x}^\rVert^2 - 1)=0.
Step 5: Case A (interior): μ* = 0.
\nThen
\n$$
\mathbf{x}^* = \mathbf{a}.
Step 6: Case B (boundary): with μ* ≥ 0.
\nWe have , so
\n$$
\lVert\mathbf{x}^\rVert = \frac{\lVert\mathbf{a}\rVert}{1+2\mu^} = 1
\quad\Rightarrow\quad
1+2\mu^* = \lVert\mathbf{a}\rVert.
Dual feasibility μ* ≥ 0 implies .
Then
\n$$
\mathbf{x}^* = \frac{1}{\lVert\mathbf{a}\rVert}\,\mathbf{a}.
Step 7: Summarize solution.
\n$$
\mathbf{x}^* =
\begin{cases}
\mathbf{a}, & \lVert\mathbf{a}\rVert \le 1 \\
\mathbf{a}/\lVert\mathbf{a}\rVert, & \lVert\mathbf{a}\rVert > 1
\end{cases}
Insight: This is a perfect KKT geometry example: if a lies inside the disk, the constraint is slack and μ*=0, so the minimizer is interior. If a lies outside, the optimum occurs on the boundary where the objective’s level set first touches the disk. Stationarity becomes a balance of the objective gradient (x−a) with the boundary normal 2x, and μ*>0 measures how strongly the boundary pushes back.
Solve
\n$$
\begin{aligned}
\min_{x,y} \quad & f(x,y) = x + y \\
\text{s.t.}\quad & g_1(x,y)= -x \le 0 \;(x\ge 0)\\
& g_2(x,y)= -y \le 0 \;(y\ge 0)
\end{aligned}
Step 1: Lagrangian.
\n$$
\mathcal{L}(x,y,\mu_1,\mu_2)= x+y + \mu_1(-x) + \mu_2(-y).
Step 2: Stationarity.
\nCompute partial derivatives:
\n$$
\frac{\partial \mathcal{L}}{\partial x}=1-\mu_1=0 \Rightarrow \mu_1^*=1,
\frac{\partial \mathcal{L}}{\partial y}=1-\mu_2=0 \Rightarrow \mu_2^*=1.
Step 3: Primal feasibility.
\nWe need x ≥ 0 and y ≥ 0.
Step 4: Complementary slackness.
\n$$
\mu_1^(-x^)=0 \Rightarrow 1\cdot (-x^)=0 \Rightarrow x^=0.
Step 5: Dual feasibility.
\nμ₁=1 ≥ 0, μ₂=1 ≥ 0 (OK).
Step 6: Conclude.
\n$$
(x^,y^)=(0,0)
Insight: At the optimum, both inequalities are active (x=0, y=0). The objective gradient is (1,1). KKT says −∇f must lie in the cone spanned by ∇g₁=(−1,0) and ∇g₂=(0,−1) with nonnegative weights—exactly what μ₁=μ₂=1 provides.
KKT extends Lagrange multipliers to inequality constraints by adding nonnegative multipliers μᵢ and the complementary slackness rule μᵢ gᵢ(x*) = 0.
The full KKT system is: stationarity (∇ₓL=0), primal feasibility (gᵢ≤0, hⱼ=0), dual feasibility (μᵢ≥0), and complementary slackness.
Complementary slackness is a per-constraint “switch”: slack constraint ⇒ μᵢ=0; positive μᵢ ⇒ constraint is active (tight).
At an optimum, −∇f(x*) lies in the cone generated by normals of active inequality constraints (plus equality constraint normals with free multipliers).
In convex optimization with a suitable constraint qualification (e.g., Slater’s condition), KKT conditions are necessary and sufficient for global optimality.
KKT conditions are the practical gateway to duality: at optimality (under strong duality), primal and dual variables satisfy KKT simultaneously.
Active-set reasoning (guess active constraints, solve, then verify) is often the most straightforward way to solve small KKT problems by hand.
Forgetting dual feasibility: allowing μᵢ < 0 for constraints written as gᵢ(x) ≤ 0 (sign conventions matter).
Applying complementary slackness incorrectly (it is μᵢ·gᵢ(x)=0, not μᵢ=0 or gᵢ(x)=0 for all i).
Treating KKT as sufficient in nonconvex problems without checking convexity/constraint qualifications (a KKT point can be a saddle or maximum).
Mixing constraint forms (e.g., writing x ≥ 0 but using μ≥0 as if it were g(x)≤0 without converting to a consistent g(x)≤0 representation).
Solve using KKT:
Hint: The unconstrained minimizer is x=0. Check if it is feasible. Use complementary slackness to decide μ.
Constraint is x ≤ 1, so x=0 is feasible and should be optimal.
Lagrangian: L(x,μ)=x^2+μ(x-1), μ≥0.
Stationarity: dL/dx=2x+μ=0 ⇒ μ=−2x.
At x=0 ⇒ μ=0.
Primal feasibility: 0−1≤0 OK. Complementary slackness: μ(x−1)=0·(−1)=0 OK.
Thus x=0, μ=0.
Solve using KKT:
Hint: Convert to g₁=−x≤0, g₂=−y≤0. The unconstrained minimizer is (1,1). Which constraints are active there?
Unconstrained minimizer is (1,1), which is feasible (x≥0,y≥0). So the solution is interior with both constraints slack.
KKT: μ₁=μ₂=0 by complementary slackness since g₁(1,1)=−1<0 and g₂(1,1)=−1<0.
Stationarity reduces to ∇f=0, which holds at (1,1).
Thus (x,y)=(1,1), μ₁=μ₂=0.
Projection with a radius constraint: for given a ∈ ℝⁿ, solve
Give x and the multiplier μ as a function of a and R.
Hint: Use g(x)=||x||^2 − R^2 ≤ 0. Repeat Example 2 but keep R symbolic. Watch the gradient of ||x||^2.
Let g(\mathbf{x})=\lVert\mathbf{x}\rVert^2−R^2≤0, μ≥0.
L(\mathbf{x},μ)=\tfrac12||\mathbf{x}-\mathbf{a}||^2+μ(||\mathbf{x}||^2−R^2).
Stationarity: (\mathbf{x}-\mathbf{a})+2μ\mathbf{x}=0 ⇒ (1+2μ)\mathbf{x}=\mathbf{a} ⇒ \mathbf{x}=\mathbf{a}/(1+2μ).
Complementary slackness: μ(||\mathbf{x}||^2−R^2)=0.
Case 1 (interior): μ=0 ⇒ \mathbf{x}=\mathbf{a}. Feasible if ||\mathbf{a}||≤R.
Case 2 (boundary): ||\mathbf{x}||=R and μ≥0. Then ||\mathbf{a}||/(1+2μ)=R ⇒ 1+2μ=||\mathbf{a}||/R ⇒ μ=\tfrac12(||\mathbf{a}||/R−1) (requires ||\mathbf{a}||≥R).
Then \mathbf{x}=\mathbf{a}/(||\mathbf{a}||/R)=R\,\mathbf{a}/||\mathbf{a}||.
Final:
\n$$
\mathbf{x}^*=
\begin{cases}
\mathbf{a}, & ||\mathbf{a}||\le R \\
R\,\mathbf{a}/||\mathbf{a}||, & ||\mathbf{a}||>R
\end{cases}
\quad\text{and}\quad
\mu^*=
\begin{cases}
0, & ||\mathbf{a}||\le R \\
\tfrac12(||\mathbf{a}||/R-1), & ||\mathbf{a}||>R.
\end{cases}
Next steps and related nodes:
Reinforces prerequisites: