Lagrangian Duality

OptimizationDifficulty: █████Depth: 10Unlocks: 0

Dual problems, weak and strong duality. Dual ascent.

Interactive Visualization

t=0s

Core Concepts

  • Lagrangian function L(x, lambda, nu) as the scalar combining objective and constraint terms via multipliers
  • Dual object: the dual function d(lambda, nu)=inf_x L(x,lambda,nu) and the derived dual optimization problem (maximize d over multipliers, with lambda>=0 for inequality constraints)

Key Symbols & Notation

L(x, lambda, nu)d(lambda, nu) = inf_x L(x, lambda, nu)

Essential Relationships

  • Weak and strong duality: for any feasible multipliers, d(lambda,nu) is a lower bound on the primal optimum (weak duality); under convexity plus a constraint qualification (e.g., Slater) the dual optimum equals the primal optimum (strong duality)
  • Dual ascent connection: the (sub)gradient of d w.r.t. multipliers equals the constraint residuals, yielding the iterative update lambda <- projection(lambda + step * constraint_violation) to maximize d
▶ Advanced Learning Details

Graph Position

87
Depth Cost
0
Fan-Out (ROI)
0
Bottleneck Score
10
Chain Length

Cognitive Load

6
Atomic Elements
36
Total Elements
L2
Percentile Level
L4
Atomic Level

All Concepts (13)

  • Lagrangian (as a function of primal variables x and multiplier vector λ) used to form a lower bound on the primal objective via inequality multipliers
  • Lagrangian dual function q(λ) defined as the infimum (over primal variables) of the Lagrangian: q(λ) = inf_x L(x,λ)
  • Lagrangian dual problem: maximize q(λ) subject to dual feasibility (λ ≥ 0 for inequality constraints)
  • Duality gap: the nonnegative difference between primal optimal value and dual optimal value
  • Weak duality: the principle that any dual feasible value is a lower bound on the primal objective (so dual optimum ≤ primal optimum)
  • Strong duality: the situation (often under convexity + a constraint qualification such as Slater's condition) where primal and dual optima coincide (zero duality gap)
  • Slater's condition (a sufficient constraint qualification for strong duality in convex problems): existence of a strictly feasible primal point
  • Concavity of the dual function q(λ) in λ (regardless of primal convexity)
  • Dual variables interpreted as shadow prices (economic/interpretational meaning of λ components)
  • Subgradient structure of the dual function: relationship between minimizers of L(x,λ) and subgradients of q
  • Primal recovery via argmin of the Lagrangian: x*(λ) ∈ argmin_x L(x,λ) gives candidate primal solutions
  • Dual ascent algorithm: iterative (sub)gradient-ascent on q(λ) to find maximizing λ, alternating minimization in x and multiplier updates
  • Projection onto the nonnegative orthant used in updates to enforce λ ≥ 0

Teaching Strategy

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

Constrained optimization feels like juggling: you want to minimize an objective while never dropping the constraints. Lagrangian duality turns that juggling act into a different game—choose prices (multipliers) for constraint violation so that the best “priced” solution automatically respects the constraints. The surprising part: those prices also generate guaranteed lower bounds on the true optimum, and in many important cases the bound becomes exact.

TL;DR:

Given a primal minimization problem with constraints, form the Lagrangian L(x,λ,ν)=f(x)+iλigi(x)+jνjhj(x)L(x,\lambda,\nu)=f(x)+\sum_i \lambda_i g_i(x)+\sum_j \nu_j h_j(x) with λ0\lambda\ge 0. The dual function d(λ,ν)=infxL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu) is always a lower bound on the primal optimum (weak duality). The dual problem maximizes this bound over multipliers. Under conditions like Slater’s condition for convex problems, the bound is tight (strong duality). Dual ascent updates multipliers to improve the bound and can solve large problems by exploiting separability and the structure of infxL\inf_x L.

What Is Lagrangian Duality?

The primal problem (what we start with)

A standard constrained minimization problem (the primal) is:

minimizexf(x)subject togi(x)0,  i=1,,mhj(x)=0,  j=1,,p\begin{aligned} \text{minimize}_{x} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0, \; i=1,\dots,m \\ & h_j(x) = 0, \; j=1,\dots,p \end{aligned}

Here:

  • xx is the decision variable (can be scalar, vector, or structured object).
  • f(x)f(x) is the objective.
  • gi(x)0g_i(x) \le 0 are inequality constraints.
  • hj(x)=0h_j(x)=0 are equality constraints.

You already know KKT conditions and Lagrange multipliers. Duality builds on those ideas but changes the emphasis: instead of directly enforcing constraints, we price violations.

The Lagrangian (the key scalar object)

Define the Lagrangian:

L(x,λ,ν)=f(x)+i=1mλigi(x)+j=1pνjhj(x)L(x,\lambda,\nu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \nu_j h_j(x)
  • λRm\lambda \in \mathbb{R}^m are multipliers for inequalities.
  • νRp\nu \in \mathbb{R}^p are multipliers for equalities.
  • For inequality constraints, we restrict λ0\lambda \ge 0 (componentwise). This is not cosmetic—it makes the “pricing” consistent.

Intuition:

  • If gi(x)0g_i(x) \le 0 is satisfied, then λigi(x)0\lambda_i g_i(x) \le 0 (since λi0\lambda_i \ge 0), so the constraint can only decrease (or leave unchanged) the Lagrangian.
  • If gi(x)>0g_i(x) > 0 is violated, then λigi(x)>0\lambda_i g_i(x) > 0, so the Lagrangian penalizes that violation.

This mirrors a familiar idea: “soft constraints” with penalties. But duality is more precise: the penalties are variables you optimize.

The dual function: best value of the priced problem

Given multipliers (λ,ν)(\lambda,\nu), consider minimizing the Lagrangian over xx:

d(λ,ν)=infxL(x,λ,ν)d(\lambda,\nu) = \inf_x L(x,\lambda,\nu)

This dd is called the dual function.

Why is it an infimum (not necessarily a minimum)? Because for some multipliers the Lagrangian might not attain a minimizer, or might go to -\infty.

Two important facts (we will prove them carefully soon):

  1. 1)For any λ0\lambda \ge 0, d(λ,ν)d(\lambda,\nu) is a lower bound on the primal optimum value pp^*.
  2. 2)d(λ,ν)d(\lambda,\nu) is concave in (λ,ν)(\lambda,\nu), even when the primal problem is not convex.

The dual problem: maximize the best lower bound

If each (λ,ν)(\lambda,\nu) gives a lower bound, we should pick the best one:

maximizeλ,νd(λ,ν)subject toλ0\begin{aligned} \text{maximize}_{\lambda,\nu} \quad & d(\lambda,\nu) \\ \text{subject to} \quad & \lambda \ge 0 \end{aligned}

This is the Lagrange dual problem. Its optimal value is denoted dd^*.

Why duality is worth learning

Duality is not just “another way to write the same thing.” It gives:

  • Certificates (bounds): Any feasible dual point (λ,ν)(\lambda,\nu) immediately certifies a lower bound on pp^*. This is extremely useful in algorithms and in debugging optimization models.
  • Alternative computation: Sometimes the primal is hard, while the dual is simpler (fewer variables, decomposes, or has a closed form).
  • Structure and insight: Multipliers often have meaning as shadow prices (marginal values of constraints).
  • Algorithms: Dual ascent, subgradient methods on the dual, ADMM, and many distributed optimization methods are built from the dual view.

We’ll proceed slowly: first we’ll make the “lower bound” statement completely concrete; then we’ll talk about when the bound becomes tight; then we’ll turn that into an algorithm.

Core Mechanic 1 — The Dual Function as an Infimum (and Why It Lower-Bounds the Primal)

Step 1: Fix multipliers and look at L(x,λ,ν)L(x,\lambda,\nu)

For a fixed (λ,ν)(\lambda,\nu) with λ0\lambda\ge 0, the Lagrangian is just a scalar function of xx:

L(x,λ,ν)=f(x)+iλigi(x)+jνjhj(x).L(x,\lambda,\nu)= f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x).

Now imagine two categories of xx:

  • Feasible: satisfies gi(x)0g_i(x)\le 0 and hj(x)=0h_j(x)=0.
  • Infeasible: violates at least one constraint.

For feasible xx, we can compare L(x,λ,ν)L(x,\lambda,\nu) and f(x)f(x).

Step 2: For feasible xx, the Lagrangian never exceeds the objective

If xx is feasible, then:

  • Each gi(x)0g_i(x)\le 0 and each λi0\lambda_i\ge 0 so λigi(x)0\lambda_i g_i(x)\le 0.
  • Each hj(x)=0h_j(x)=0 so νjhj(x)=0\nu_j h_j(x)=0.

Therefore:

L(x,λ,ν)=f(x)+iλigi(x)+jνjhj(x)f(x).L(x,\lambda,\nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x) \le f(x).

This inequality is the heart of weak duality.

Step 3: Take the best (lowest) Lagrangian value over all xx

By definition,

d(λ,ν)=infxL(x,λ,ν).d(\lambda,\nu) = \inf_x L(x,\lambda,\nu).

The infimum over all xx is certainly no larger than the value at any particular feasible xx:

d(λ,ν)L(x,λ,ν)f(x)for any feasible x.d(\lambda,\nu) \le L(x,\lambda,\nu) \le f(x) \quad \text{for any feasible } x.

Now take the infimum over all feasible xx on the right-hand side. The best feasible objective value is the primal optimum value pp^*:

d(λ,ν)pfor all λ0.d(\lambda,\nu) \le p^* \quad \text{for all } \lambda\ge 0.

This is weak duality.

Weak duality: For a primal minimization problem, any dual-feasible (λ,ν)(\lambda,\nu) gives a lower bound on the primal optimal value.

Pause point #1 (tiny sanity-check)

Before moving on, check that you can literally compute a dual function once.

Consider the 1D problem:

minimizexRx2subject tox1\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x^2 \\ \text{subject to} \quad & x \ge 1 \end{aligned}

Rewrite the constraint as g(x)=1x0g(x)=1-x\le 0.

  • Lagrangian: L(x,λ)=x2+λ(1x)L(x,\lambda)=x^2 + \lambda(1-x) with λ0\lambda\ge 0.
  • Dual function: d(λ)=infx(x2+λ(1x))d(\lambda)=\inf_x \big(x^2 + \lambda(1-x)\big).

Try computing d(λ)d(\lambda) by minimizing over xx (complete the square or set derivative to zero). Then compare maxλ0d(λ)\max_{\lambda\ge 0} d(\lambda) with the primal optimum p=1p^*=1.

(We’ll do a closely related worked example later, but you should attempt this now—this is the first “feel” of infx\inf_x.)

The duality gap

Define:

  • Primal optimum value: pp^*
  • Dual optimum value: dd^*

Weak duality implies:

dp.d^* \le p^*.

The difference pdp^* - d^* is the duality gap.

  • If p=dp^* = d^*, duality is tight and we have strong duality.
  • If p>dp^* > d^*, there is a positive gap.

Why the dual function is concave (important for optimization)

This may feel counterintuitive: d(λ,ν)=infxL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu), an infimum of functions linear in (λ,ν)(\lambda,\nu). But the infimum of affine functions is concave.

Proof sketch (worth internalizing): for each fixed xx, define

ϕx(λ,ν)=L(x,λ,ν).\phi_x(\lambda,\nu)=L(x,\lambda,\nu).

As a function of (λ,ν)(\lambda,\nu), ϕx\phi_x is affine (linear + constant) because it is f(x)+λTg(x)+νTh(x)f(x) + \lambda^T g(x) + \nu^T h(x).

Then

d(λ,ν)=infxϕx(λ,ν).d(\lambda,\nu) = \inf_x \phi_x(\lambda,\nu).

The pointwise infimum of affine functions is concave. Concretely, for 0t10\le t\le 1:

d(t(λ1,ν1)+(1t)(λ2,ν2))=infxϕx(t(λ1,ν1)+(1t)(λ2,ν2))=infx(tϕx(λ1,ν1)+(1t)ϕx(λ2,ν2))tinfxϕx(λ1,ν1)+(1t)infxϕx(λ2,ν2)=td(λ1,ν1)+(1t)d(λ2,ν2).\begin{aligned} d(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2)) &= \inf_x \phi_x(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2)) \\ &= \inf_x \left(t\phi_x(\lambda_1,\nu_1)+(1-t)\phi_x(\lambda_2,\nu_2)\right) \\ &\ge t\inf_x \phi_x(\lambda_1,\nu_1) + (1-t)\inf_x \phi_x(\lambda_2,\nu_2) \\ &= t d(\lambda_1,\nu_1) + (1-t) d(\lambda_2,\nu_2). \end{aligned}

That inequality step is the key: infimum of a sum is ≥ sum of infima.

Interactive-canvas moment: visualize “lower bound curves”

If your tech tree platform supports an interactive plot, this is the perfect place to attach it.

Suggested interaction:

  • Pick a simple primal (like minimizing a quadratic with a linear inequality).
  • For each λ0\lambda\ge 0, compute d(λ)d(\lambda).
  • Plot the curve d(λ)d(\lambda) and show the horizontal line at pp^*.

What you should see:

  • For small λ\lambda, the penalty is weak; the infimum may be far below pp^*.
  • As λ\lambda changes, d(λ)d(\lambda) rises (not necessarily monotonically, but it has a concave “cap” shape).
  • At the maximizing λ\lambda^*, d(λ)d(\lambda^*) is the best lower bound; in strong-duality cases it meets pp^*.

This visualization makes the definition d(λ)=infxL(x,λ)d(\lambda)=\inf_x L(x,\lambda) feel less abstract: you are literally pushing up the best guaranteed lower bound by tuning prices.

Core Mechanic 2 — Strong Duality: When the Lower Bound Becomes Exact

Weak duality always holds, but strong duality is where duality becomes a solving tool, not just a bounding tool.

Convexity is the enabling assumption

Strong duality is most commonly guaranteed in convex optimization problems of the form:

minimizexf(x)subject togi(x)0,  i=1,,mAx=b\begin{aligned} \text{minimize}_x \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0, \; i=1,\dots,m \\ & Ax=b \end{aligned}

where:

  • ff is convex,
  • each gig_i is convex,
  • Ax=bAx=b is affine.

(Equality constraints more generally can be affine functions hj(x)=0h_j(x)=0; writing them as Ax=bAx=b is a clean special case.)

Convexity matters because it prevents “hidden” nonconvex geometry that can create a duality gap.

Slater’s condition (the standard constraint qualification)

A widely used sufficient condition for strong duality is Slater’s condition:

For convex problems with inequality constraints gi(x)0g_i(x)\le 0 and affine equalities, if there exists a point x~\tilde{x} such that

  • gi(x~)<0g_i(\tilde{x}) < 0 for all ii (strict feasibility),
  • and Ax~=bA\tilde{x}=b,

then strong duality holds and KKT conditions are necessary and sufficient.

So Slater is an “interior point exists” condition.

Why do we need it? Intuitively:

  • If the feasible set has nonempty interior (relative to the equality constraints), multipliers behave nicely.
  • If feasible points only exist on the boundary in a fragile way, multipliers may blow up or fail to represent the problem tightly.

What strong duality gives you (operationally)

If strong duality holds:

p=d.p^* = d^*.

Moreover, there exist optimal multipliers (λ,ν)(\lambda^*,\nu^*) such that:

  • d(λ,ν)=pd(\lambda^*,\nu^*) = p^*
  • and a primal optimal solution xx^* together with (λ,ν)(\lambda^*,\nu^*) satisfy the KKT conditions:
  • Primal feasibility
  • Dual feasibility (λ0\lambda^*\ge 0)
  • Complementary slackness (λigi(x)=0\lambda_i^* g_i(x^*)=0)
  • Stationarity (f(x)+iλigi(x)+jνjhj(x)=0\nabla f(x^*) + \sum_i \lambda_i^* \nabla g_i(x^*) + \sum_j \nu_j^* \nabla h_j(x^*) = 0)

In other words, duality explains why KKT works: it’s the optimality condition for a saddle point of the Lagrangian.

Saddle-point picture (a unifying mental model)

Think of the Lagrangian as a two-player game:

  • Player X chooses xx to minimize L(x,λ,ν)L(x,\lambda,\nu).
  • Player Λ chooses (λ,ν)(\lambda,\nu) (with λ0\lambda\ge 0) to maximize L(x,λ,ν)L(x,\lambda,\nu).

The primal problem is like:

p=infx feasiblef(x).p^* = \inf_{x \text{ feasible}} f(x).

The dual problem is:

d=supλ0,νinfxL(x,λ,ν).d^* = \sup_{\lambda\ge 0,\nu} \inf_x L(x,\lambda,\nu).

A saddle point (x,λ,ν)(x^*,\lambda^*,\nu^*) satisfies:

L(x,λ,ν)L(x,λ,ν)L(x,λ,ν)L(x^*,\lambda,\nu) \le L(x^*,\lambda^*,\nu^*) \le L(x,\lambda^*,\nu^*)

for all feasible multipliers and all xx.

  • The right inequality says xx^* minimizes the Lagrangian under optimal multipliers.
  • The left inequality says optimal multipliers make the Lagrangian as large as possible at xx^*.

Strong duality is closely related to the existence of such a saddle point.

Pause point #2 (Slater sanity-check)

Decide whether Slater’s condition holds.

Problem A:

minimizexRxsubject tox21\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x \\ \text{subject to} \quad & x^2 \le 1 \end{aligned}

Is there a strictly feasible point? Yes: x=0x=0 gives 0<10<1. So Slater holds.

Problem B:

minimizexRxsubject tox20\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x \\ \text{subject to} \quad & x^2 \le 0 \end{aligned}

Feasible set is only {0}\{0\}. There is no xx with x2<0x^2<0, so Slater fails.

Take 30 seconds to reflect: both problems are convex, but B has a “boundary-only” feasible set.

Slater failing does not automatically mean strong duality fails, but it removes a key guarantee.

Strong duality is not universal

For nonconvex problems, duality can be loose:

  • Dual provides a lower bound (still true).
  • But dd^* can be strictly less than pp^*.

This is why in combinatorial optimization, one often uses duality as a relaxation: solve the dual (or a related convex relaxation) to get bounds even if you can’t solve the primal exactly.

A practical consequence: dual optimality = optimal certificate

When strong duality holds, a dual optimal solution (λ,ν)(\lambda^*,\nu^*) is a certificate of optimality:

  • If you have a candidate primal feasible xx and a dual feasible (λ,ν)(\lambda,\nu) such that f(x)=d(λ,ν)f(x)=d(\lambda,\nu), then you have proven global optimality (because d(λ,ν)pf(x)d(\lambda,\nu)\le p^*\le f(x)).

This “sandwich” proof is one of the most useful ways to trust an optimizer.

Application/Connection — Dual Ascent and How Duality Becomes an Algorithm

Duality is not only theory; it directly yields optimization methods.

The dual problem as concave maximization

The dual problem is:

maxλ0,ν  d(λ,ν)\max_{\lambda\ge 0,\nu} \; d(\lambda,\nu)

with dd concave. So we are maximizing a concave function, which is the “nice” direction (equivalently, minimizing a convex function d-d).

But there’s a catch: d(λ,ν)d(\lambda,\nu) is defined via an infimum over xx. It may be nonsmooth.

Computing a subgradient of the dual function

Suppose for given (λ,ν)(\lambda,\nu) we can find

x(λ,ν)argminxL(x,λ,ν).x(\lambda,\nu) \in \arg\min_x L(x,\lambda,\nu).

Then a subgradient of dd is given by constraint residuals at that minimizing xx:

  • For inequalities: λid(λ,ν)\partial_{\lambda_i} d(\lambda,\nu) includes gi(x(λ,ν))g_i(x(\lambda,\nu)).
  • For equalities: νjd(λ,ν)\partial_{\nu_j} d(\lambda,\nu) includes hj(x(λ,ν))h_j(x(\lambda,\nu)).

Intuition:

  • If a constraint is violated (gi(x)>0g_i(x)>0), increasing λi\lambda_i should raise the penalty and thus raise the lower bound.
  • If a constraint is satisfied with slack (gi(x)<0g_i(x)<0), increasing λi\lambda_i may actually lower dd (and the optimizer will keep λi\lambda_i from growing).

Dual ascent updates

A basic dual ascent / subgradient method:

  1. 1)Given (λk,νk)(\lambda^k,\nu^k), compute
xkargminxL(x,λk,νk).x^k \in \arg\min_x L(x,\lambda^k,\nu^k).
  1. 2)Update multipliers using residuals:
λk+1=Πλ0(λk+αkg(xk))\lambda^{k+1} = \Pi_{\lambda\ge 0}\big(\lambda^k + \alpha_k\, g(x^k)\big)
νk+1=νk+αkh(xk)\nu^{k+1} = \nu^k + \alpha_k\, h(x^k)

Where:

  • g(xk)g(x^k) is the vector (g1(xk),,gm(xk))(g_1(x^k),\dots,g_m(x^k)).
  • h(xk)h(x^k) is the vector (h1(xk),,hp(xk))(h_1(x^k),\dots,h_p(x^k)).
  • Πλ0\Pi_{\lambda\ge 0} projects onto the nonnegative orthant (componentwise max with 0).
  • αk\alpha_k is a step size.

Why projection only for λ\lambda? Because dual feasibility requires λ0\lambda\ge 0 for inequalities, but ν\nu is free.

How to choose step sizes (briefly)

Subgradient methods are sensitive to step sizes.

Common choices:

  • Constant step size: αk=α\alpha_k=\alpha (often gives a neighborhood around optimum).
  • Diminishing step size: αk0\alpha_k \to 0 with kαk=\sum_k \alpha_k=\infty (classic convergence conditions for subgradient methods).

In practice, many systems use more advanced variants (accelerations, adaptive rules) or different splitting methods (e.g., ADMM) that are also rooted in the Lagrangian.

Why dual ascent can be efficient: separability and decomposition

A major payoff occurs when the Lagrangian separates over components of xx.

Suppose x=(x1,,xN)x=(x_1,\dots,x_N) and

f(x)=n=1Nfn(xn)f(x)=\sum_{n=1}^N f_n(x_n)

and constraints couple them only through a simple sum, e.g.

n=1NAnxn=b.\sum_{n=1}^N A_n x_n = b.

Then

L(x,ν)=n=1Nfn(xn)+νT(n=1NAnxnb)=n=1N(fn(xn)+νTAnxn)νTb.L(x,\nu)=\sum_{n=1}^N f_n(x_n) + \nu^T\left(\sum_{n=1}^N A_n x_n - b\right) = \sum_{n=1}^N \left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.

Now the minimization over xx splits:

infxL(x,ν)=n=1Ninfxn(fn(xn)+νTAnxn)νTb.\inf_x L(x,\nu) = \sum_{n=1}^N \inf_{x_n}\left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.

So computing d(ν)d(\nu) is NN smaller problems, often parallelizable.

This is the conceptual foundation of distributed optimization: multipliers coordinate many local optimizations.

Connecting back to KKT

If strong duality holds and the method converges to (λ,ν)(\lambda^*,\nu^*) with corresponding xx^* minimizing L(,λ,ν)L(\cdot,\lambda^*,\nu^*), then KKT conditions emerge:

  • Stationarity: xx^* minimizes LL.
  • Complementary slackness: in the limit, multipliers grow only where constraints are tight.
  • Feasibility: residuals driven toward 0 by multiplier updates.

This is a clean story:

  • KKT describes optimality.
  • Dual ascent is one route to satisfy KKT by iteratively tuning multipliers.

A second interactive-canvas moment (algorithm intuition)

If you can animate iterations:

  • Show a point xkx^k moving in primal space (perhaps unconstrained minimizer of LL).
  • Show λk\lambda^k changing over time.
  • Overlay the constraint boundary g(x)=0g(x)=0.

What learners should observe:

  • If xkx^k violates the constraint, λ\lambda increases, “tilting” LL so the next minimizer is pushed back toward feasibility.
  • If xkx^k is strictly feasible, λ\lambda may decrease toward 0.

This makes dual ascent feel like an automatic constraint-enforcement mechanism driven by prices.

Worked Examples (3)

Dual function and dual optimum for a 1D quadratic with an inequality

Primal:

minimizexRx2subject tox1\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x^2 \\ \text{subject to} \quad & x \ge 1 \end{aligned}

Rewrite x1x\ge 1 as g(x)=1x0g(x)=1-x\le 0.

  1. Form the Lagrangian (single multiplier):

    \n$L(x,λ)=x2+λ(1x),λ0.L(x,\lambda)=x^2+\lambda(1-x),\quad \lambda\ge 0.$

  2. Compute the dual function:

    \n$d(λ)=infx(x2λx+λ).d(\lambda)=\inf_x \left(x^2-\lambda x + \lambda\right).$

  3. Minimize the quadratic over x by setting derivative to zero:

    \n$ddx(x2λx+λ)=2xλ=0x(λ)=λ2.\frac{d}{dx}\left(x^2-\lambda x + \lambda\right)=2x-\lambda=0 \Rightarrow x^*(\lambda)=\frac{\lambda}{2}.$

  4. Plug back in to get the infimum value:

    \n$$

    \begin{aligned}

    d(\lambda)

    &= \left(\frac{\lambda}{2}\right)^2 - \lambda\left(\frac{\lambda}{2}\right) + \lambda \\

    &= \frac{\lambda^2}{4} - \frac{\lambda^2}{2} + \lambda \\

    &= -\frac{\lambda^2}{4}+\lambda.

    \end{aligned}

  5. Solve the dual problem:

    \n$maxλ0  λ24+λ.\max_{\lambda\ge 0}\; -\frac{\lambda^2}{4}+\lambda.$

  6. This is a concave parabola. Differentiate:

    \n$ddλ(λ24+λ)=λ2+1.\frac{d}{d\lambda}\left(-\frac{\lambda^2}{4}+\lambda\right)=-\frac{\lambda}{2}+1.$

  7. Set to zero: λ/2+1=0λ=2-\lambda/2+1=0 \Rightarrow \lambda^*=2 (and it satisfies λ0\lambda\ge 0).

  8. Compute the dual optimum value:

    \n$d=d(2)=44+2=1.d^*=d(2)=-\frac{4}{4}+2=1.$

  9. Compute primal optimum for comparison: constraint forces x1x\ge 1, and x2x^2 is minimized at x=1x=1, so p=1p^*=1.

Insight: Here d=pd^*=p^*, so strong duality holds (the problem is convex and Slater holds since e.g. x=2x=2 gives strict feasibility 12<01-2<0). Also notice the meaning of λ=2\lambda^*=2: it is the price that makes the unconstrained minimizer of LL land exactly at the boundary x=1x=1 (since x(λ)=λ/2x^*(\lambda)=\lambda/2).

Deriving the dual of a quadratic program with an equality constraint

Primal (a simple equality-constrained QP):

minimizexRn12xTQx+cTxsubject toAx=b\begin{aligned} \text{minimize}_{x\in\mathbb{R}^n} \quad & \frac{1}{2}x^T Q x + c^T x \\ \text{subject to} \quad & Ax=b \end{aligned}

Assume Q0Q\succ 0 (positive definite), so the objective is strictly convex and the minimizer is unique for each \(\nu\).

  1. Form the Lagrangian (only equality multipliers):

    \n$L(x,ν)=12xTQx+cTx+νT(Axb).L(x,\nu)=\frac{1}{2}x^TQx + c^T x + \nu^T(Ax-b).$

  2. Compute the dual function:

    \n$d(ν)=infxL(x,ν).d(\nu)=\inf_x L(x,\nu).$

  3. Minimize over x by stationarity (differentiate w.r.t. x and set to 0):

    \n$xL(x,ν)=Qx+c+ATν=0.\nabla_x L(x,\nu)=Qx+c+A^T\nu=0.$

  4. Solve for the minimizing x as a function of \(\nu\):

    \n$x(ν)=Q1(c+ATν).x^*(\nu)=-Q^{-1}(c+A^T\nu).$

  5. Plug back into the Lagrangian. First expand:

    \n$$

    \begin{aligned}

    L(x,\nu)

    &=\frac{1}{2}x^TQx + (c+A^T\nu)^T x - \nu^T b.

    \end{aligned}

  6. Use the standard quadratic minimization identity: for Q0Q\succ 0,

    \n$infx(12xTQx+qTx)=12qTQ1q,\inf_x \left(\frac{1}{2}x^TQx + q^T x\right) = -\frac{1}{2} q^T Q^{-1} q,$

    \nattained at x=Q1qx=-Q^{-1}q.

    Here q=(c+ATν)q=(c+A^T\nu).

  7. Therefore:

    \n$$

    \begin{aligned}

    d(\nu)

    &= -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.

    \end{aligned}

  8. Write the dual problem:

    \n$maxνRp  12(c+ATν)TQ1(c+ATν)νTb.\max_{\nu\in\mathbb{R}^p}\; -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.$

  9. Optionally expand into a standard concave quadratic form in \(\nu\):

    \n$$

    \begin{aligned}

    (c+A^T\nu)^T Q^{-1}(c+A^T\nu)

    = c^TQ^{-1}c + 2\nu^T A Q^{-1}c + \nu^T A Q^{-1} A^T \nu.

    \end{aligned}

    So \n$$ \begin{aligned} d(\nu)= -\frac{1}{2}\nu^T(AQ^{-1}A^T)\nu - \nu^T(AQ^{-1}c + b) - \frac{1}{2}c^TQ^{-1}c. \end{aligned}

Insight: This example shows the typical dual pattern: (1) build LL, (2) compute dd by eliminating xx via an infimum, and (3) get a concave maximization over multipliers. The matrix AQ1ATAQ^{-1}A^T governs the dual curvature. When the primal is large but pp (number of equalities) is small, the dual can be much cheaper.

One iteration of dual ascent for an inequality-constrained problem

Primal:

minimizexR(x3)2subject tox1\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & (x-3)^2 \\ \text{subject to} \quad & x \le 1 \end{aligned}

Rewrite as g(x)=x10g(x)=x-1\le 0. Lagrangian: L(x,λ)=(x3)2+λ(x1)L(x,\lambda)=(x-3)^2+\lambda(x-1), λ0\lambda\ge 0.

We’ll run one dual-ascent step with step size α=0.5\alpha=0.5, starting from λ0=0\lambda^0=0.

  1. Given λ0=0\lambda^0=0, minimize L(x,λ0)L(x,\lambda^0) over x:

    \n$L(x,0)=(x3)2.L(x,0)=(x-3)^2.$

    Minimizer is x0=3x^0=3.

  2. Compute constraint residual at x0x^0:

    \n$g(x0)=x01=2>0,g(x^0)=x^0-1=2>0,$

    so the constraint is violated.

  3. Update the multiplier using projected ascent:

    \n$λ1=max{0,λ0+αg(x0)}=max{0,0+0.52}=1.\lambda^{1}=\max\{0,\lambda^0+\alpha g(x^0)\}=\max\{0,0+0.5\cdot 2\}=1.$

  4. Now (to see the effect), compute the next primal minimizer for λ1=1\lambda^1=1:

    \nMinimize L(x,1)=(x3)2+(x1)L(x,1)=(x-3)^2 + (x-1).

    Differentiate:

    \n$ddx((x3)2+x1)=2(x3)+1=2x5.\frac{d}{dx}\left((x-3)^2+x-1\right)=2(x-3)+1=2x-5.$

    Set to zero: 2x5=0x1=2.52x-5=0 \Rightarrow x^1=2.5.

    Still infeasible, but closer to the constraint boundary x1x\le 1 than 3.

Insight: Dual ascent increases λ\lambda when constraints are violated. That increase changes the unconstrained minimizer of LL, pushing the next xx toward feasibility. With proper stepsizes (and under convexity/regularity), iterating this can drive residuals down and converge to the constrained optimum.

Key Takeaways

  • The Lagrangian L(x,λ,ν)=f(x)+λTg(x)+νTh(x)L(x,\lambda,\nu)=f(x)+\lambda^T g(x)+\nu^T h(x) combines objective and constraints using multipliers, with λ0\lambda\ge 0 for inequalities.

  • The dual function d(λ,ν)=infxL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu) is always a lower bound on the primal optimum value pp^* when λ0\lambda\ge 0 (weak duality).

  • The dual problem maxλ0,νd(λ,ν)\max_{\lambda\ge 0,\nu} d(\lambda,\nu) chooses the tightest lower bound; its value dd^* satisfies dpd^*\le p^*.

  • Even if the primal is nonconvex, the dual function is concave because it is the pointwise infimum of affine functions in (λ,ν)(\lambda,\nu).

  • In convex problems, Slater’s condition (existence of a strictly feasible point) is a standard sufficient condition for strong duality and KKT necessity/sufficiency.

  • When strong duality holds, matching primal/dual values provide a certificate of optimality: d(λ,ν)pf(x)d(\lambda,\nu)\le p^*\le f(x), and equality proves optimality.

  • Dual ascent (subgradient ascent on dd) updates multipliers using constraint residuals; projection enforces λ0\lambda\ge 0.

  • Duality becomes especially powerful when infxL\inf_x L decomposes across components of xx, enabling parallel and distributed optimization.

Common Mistakes

  • Forgetting the sign restriction λ0\lambda\ge 0 for inequality constraints, which breaks the lower-bound (weak duality) argument.

  • Confusing d(λ,ν)=infxL(x,λ,ν)d(\lambda,\nu)=\inf_x L(x,\lambda,\nu) with minxf(x)\min_x f(x) or with the constrained minimum—dd is an unconstrained infimum over xx of the Lagrangian.

  • Assuming strong duality always holds; without convexity + a constraint qualification (e.g., Slater), the duality gap can be positive.

  • Running dual ascent without projection on λ\lambda or with unsuitable step sizes, leading to divergence or oscillation (common with nonsmooth dual functions).

Practice

easy

Compute the dual function for the problem: minimize x2x^2 subject to x2x\le 2. Then find the dual optimum value and compare to the primal optimum.

Hint: Write the constraint as g(x)=x20g(x)=x-2\le 0. Form L(x,λ)=x2+λ(x2)L(x,\lambda)=x^2+\lambda(x-2) with λ0\lambda\ge 0. Minimize the quadratic in xx to get d(λ)d(\lambda), then maximize over λ0\lambda\ge 0.

Show solution

Constraint x2x\le 2 means g(x)=x20g(x)=x-2\le 0.

Lagrangian: L(x,λ)=x2+λ(x2)L(x,\lambda)=x^2+\lambda(x-2).

Dual function:

d(λ)=infx(x2+λx2λ).d(\lambda)=\inf_x (x^2+\lambda x-2\lambda).

Stationarity: 2x+λ=0x(λ)=λ/22x+\lambda=0 \Rightarrow x^*(\lambda)=-\lambda/2.

Plug in:

d(λ)=(λ2)2+λ(λ2)2λ=λ24λ222λ=λ242λ.\begin{aligned} d(\lambda) &= \left(-\frac{\lambda}{2}\right)^2 + \lambda\left(-\frac{\lambda}{2}\right) - 2\lambda \\ &= \frac{\lambda^2}{4}-\frac{\lambda^2}{2}-2\lambda \\ &= -\frac{\lambda^2}{4}-2\lambda. \end{aligned}

Maximize over λ0\lambda\ge 0. This concave parabola is decreasing for λ0\lambda\ge 0 (derivative λ/22<0-\lambda/2-2<0), so maximum occurs at λ=0\lambda^*=0.

Thus d=d(0)=0d^*=d(0)=0.

Primal optimum: unconstrained minimizer x=0x=0 is feasible (since 020\le 2), so p=0p^*=0.

Hence $d^=p^=0 (\text{strong duality holds}).

medium

Slater check: Consider minimizing a convex function f(x)f(x) subject to the single constraint x21\|x\|_2 \le 1 (in Rn\mathbb{R}^n). Does Slater’s condition hold? What if the constraint is x20\|x\|_2 \le 0?

Hint: Slater requires a strictly feasible point for inequalities: find an xx such that x2<1\|x\|_2 < 1 (or < 0).

Show solution

For x21\|x\|_2 \le 1, Slater holds because x=0x=0 satisfies 02=0<1\|0\|_2=0<1.

For x20\|x\|_2 \le 0, the feasible set is only {0}\{0\} and there is no xx with x2<0\|x\|_2<0. So Slater fails.

hard

Dual ascent reasoning: For a problem with one inequality constraint g(x)0g(x)\le 0, suppose at iteration k you find xkargminxL(x,λk)x^k \in \arg\min_x L(x,\lambda^k) and you observe g(xk)>0g(x^k)>0. What does the update λk+1=max{0,λk+αg(xk)}\lambda^{k+1}=\max\{0,\lambda^k+\alpha g(x^k)\} do, and why is that direction sensible for maximizing d(λ)d(\lambda)?

Hint: Interpret g(xk)g(x^k) as a (sub)gradient of dd at λk\lambda^k. Also interpret λ\lambda as a penalty weight on violations.

Show solution

If g(xk)>0g(x^k)>0, the constraint is violated at the current Lagrangian minimizer. The update increases λ\lambda (since λk+αg(xk)>λk\lambda^k+\alpha g(x^k)>\lambda^k), making future minimizations of L(x,λ)L(x,\lambda) penalize positive g(x)g(x) more heavily.

This is sensible because (under mild conditions) g(xk)g(x^k) is a subgradient of the concave dual function d(λ)=infx(f(x)+λg(x))d(\lambda)=\inf_x \big(f(x)+\lambda g(x)\big). For concave maximization, ascending along a subgradient increases dd locally. The projection max{0,}\max\{0,\cdot\} preserves dual feasibility λ0\lambda\ge 0, which is required for weak duality and the correctness of the dual problem.

Connections

Quality: B (4.0/5)