Lagrangian Duality

OptimizationDifficulty: █████Depth: 10Unlocks: 0

Dual problems, weak and strong duality. Dual ascent.

Interactive Visualization

t=0s

Core Concepts

▸Lagrangian function L(x, lambda, nu) as the scalar combining objective and constraint terms via multipliers
▸Dual object: the dual function d(lambda, nu)=inf_x L(x,lambda,nu) and the derived dual optimization problem (maximize d over multipliers, with lambda>=0 for inequality constraints)

Key Symbols & Notation

L(x, lambda, nu)d(lambda, nu) = inf_x L(x, lambda, nu)

Essential Relationships

↔Weak and strong duality: for any feasible multipliers, d(lambda,nu) is a lower bound on the primal optimum (weak duality); under convexity plus a constraint qualification (e.g., Slater) the dual optimum equals the primal optimum (strong duality)
↔Dual ascent connection: the (sub)gradient of d w.r.t. multipliers equals the constraint residuals, yielding the iterative update lambda <- projection(lambda + step * constraint_violation) to maximize d

Prerequisites (2)

KKT Conditions6 atoms

Lagrange Multipliers5 atoms

▶ Advanced Learning Details

Graph Position

Depth Cost

Fan-Out (ROI)

Bottleneck Score

Chain Length

Cognitive Load

Atomic Elements

Total Elements

Percentile Level

Atomic Level

All Concepts (13)

• Lagrangian (as a function of primal variables x and multiplier vector λ) used to form a lower bound on the primal objective via inequality multipliers
• Lagrangian dual function q(λ) defined as the infimum (over primal variables) of the Lagrangian: q(λ) = inf_x L(x,λ)
• Lagrangian dual problem: maximize q(λ) subject to dual feasibility (λ ≥ 0 for inequality constraints)
• Duality gap: the nonnegative difference between primal optimal value and dual optimal value
• Weak duality: the principle that any dual feasible value is a lower bound on the primal objective (so dual optimum ≤ primal optimum)
• Strong duality: the situation (often under convexity + a constraint qualification such as Slater's condition) where primal and dual optima coincide (zero duality gap)
• Slater's condition (a sufficient constraint qualification for strong duality in convex problems): existence of a strictly feasible primal point
• Concavity of the dual function q(λ) in λ (regardless of primal convexity)
• Dual variables interpreted as shadow prices (economic/interpretational meaning of λ components)
• Subgradient structure of the dual function: relationship between minimizers of L(x,λ) and subgradients of q
• Primal recovery via argmin of the Lagrangian: x*(λ) ∈ argmin_x L(x,λ) gives candidate primal solutions
• Dual ascent algorithm: iterative (sub)gradient-ascent on q(λ) to find maximizing λ, alternating minimization in x and multiplier updates
• Projection onto the nonnegative orthant used in updates to enforce λ ≥ 0

Teaching Strategy

Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.

Constrained optimization feels like juggling: you want to minimize an objective while never dropping the constraints. Lagrangian duality turns that juggling act into a different game—choose prices (multipliers) for constraint violation so that the best “priced” solution automatically respects the constraints. The surprising part: those prices also generate guaranteed lower bounds on the true optimum, and in many important cases the bound becomes exact.

TL;DR:

Given a primal minimization problem with constraints, form the Lagrangian $L(x,\lambda,\nu)=f(x)+\sum_i \lambda_i g_i(x)+\sum_j \nu_j h_j(x)$ with $\lambda\ge 0$ . The dual function $d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$ is always a lower bound on the primal optimum (weak duality). The dual problem maximizes this bound over multipliers. Under conditions like Slater’s condition for convex problems, the bound is tight (strong duality). Dual ascent updates multipliers to improve the bound and can solve large problems by exploiting separability and the structure of $\inf_x L$ .

What Is Lagrangian Duality?

The primal problem (what we start with)

A standard constrained minimization problem (the primal) is:

\begin{aligned} \text{minimize}_{x} \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0, \; i=1,\dots,m \\ & h_j(x) = 0, \; j=1,\dots,p \end{aligned}

Here:

• $x$ is the decision variable (can be scalar, vector, or structured object).
• $f(x)$ is the objective.
• $g_i(x) \le 0$ are inequality constraints.
• $h_j(x)=0$ are equality constraints.

You already know KKT conditions and Lagrange multipliers. Duality builds on those ideas but changes the emphasis: instead of directly enforcing constraints, we price violations.

The Lagrangian (the key scalar object)

Define the Lagrangian:

L(x,\lambda,\nu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \nu_j h_j(x)

• $\lambda \in \mathbb{R}^m$ are multipliers for inequalities.
• $\nu \in \mathbb{R}^p$ are multipliers for equalities.
•For inequality constraints, we restrict $\lambda \ge 0$ (componentwise). This is not cosmetic—it makes the “pricing” consistent.

Intuition:

•If $g_i(x) \le 0$ is satisfied, then $\lambda_i g_i(x) \le 0$ (since $\lambda_i \ge 0$ ), so the constraint can only decrease (or leave unchanged) the Lagrangian.
•If $g_i(x) > 0$ is violated, then $\lambda_i g_i(x) > 0$ , so the Lagrangian penalizes that violation.

This mirrors a familiar idea: “soft constraints” with penalties. But duality is more precise: the penalties are variables you optimize.

The dual function: best value of the priced problem

Given multipliers $(\lambda,\nu)$ , consider minimizing the Lagrangian over $x$ :

d(\lambda,\nu) = \inf_x L(x,\lambda,\nu)

This $d$ is called the dual function.

Why is it an infimum (not necessarily a minimum)? Because for some multipliers the Lagrangian might not attain a minimizer, or might go to $-\infty$ .

Two important facts (we will prove them carefully soon):

1)For any $\lambda \ge 0$ , $d(\lambda,\nu)$ is a lower bound on the primal optimum value $p^*$ .
2) $d(\lambda,\nu)$ is concave in $(\lambda,\nu)$ , even when the primal problem is not convex.

The dual problem: maximize the best lower bound

If each $(\lambda,\nu)$ gives a lower bound, we should pick the best one:

\begin{aligned} \text{maximize}_{\lambda,\nu} \quad & d(\lambda,\nu) \\ \text{subject to} \quad & \lambda \ge 0 \end{aligned}

This is the Lagrange dual problem. Its optimal value is denoted $d^*$ .

Why duality is worth learning

Duality is not just “another way to write the same thing.” It gives:

•Certificates (bounds): Any feasible dual point $(\lambda,\nu)$ immediately certifies a lower bound on $p^*$ . This is extremely useful in algorithms and in debugging optimization models.
•Alternative computation: Sometimes the primal is hard, while the dual is simpler (fewer variables, decomposes, or has a closed form).
•Structure and insight: Multipliers often have meaning as shadow prices (marginal values of constraints).
•Algorithms: Dual ascent, subgradient methods on the dual, ADMM, and many distributed optimization methods are built from the dual view.

We’ll proceed slowly: first we’ll make the “lower bound” statement completely concrete; then we’ll talk about when the bound becomes tight; then we’ll turn that into an algorithm.

Core Mechanic 1 — The Dual Function as an Infimum (and Why It Lower-Bounds the Primal)

Step 1: Fix multipliers and look at $L(x,\lambda,\nu)$

For a fixed $(\lambda,\nu)$ with $\lambda\ge 0$ , the Lagrangian is just a scalar function of $x$ :

L(x,\lambda,\nu)= f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x).

Now imagine two categories of $x$ :

•Feasible: satisfies $g_i(x)\le 0$ and $h_j(x)=0$ .
•Infeasible: violates at least one constraint.

For feasible $x$ , we can compare $L(x,\lambda,\nu)$ and $f(x)$ .

Step 2: For feasible $x$ , the Lagrangian never exceeds the objective

If $x$ is feasible, then:

•Each $g_i(x)\le 0$ and each $\lambda_i\ge 0$ so $\lambda_i g_i(x)\le 0$ .
•Each $h_j(x)=0$ so $\nu_j h_j(x)=0$ .

Therefore:

L(x,\lambda,\nu) = f(x) + \sum_i \lambda_i g_i(x) + \sum_j \nu_j h_j(x) \le f(x).

This inequality is the heart of weak duality.

Step 3: Take the best (lowest) Lagrangian value over all $x$

By definition,

d(\lambda,\nu) = \inf_x L(x,\lambda,\nu).

The infimum over all $x$ is certainly no larger than the value at any particular feasible $x$ :

d(\lambda,\nu) \le L(x,\lambda,\nu) \le f(x) \quad \text{for any feasible } x.

Now take the infimum over all feasible $x$ on the right-hand side. The best feasible objective value is the primal optimum value $p^*$ :

d(\lambda,\nu) \le p^* \quad \text{for all } \lambda\ge 0.

This is weak duality.

Weak duality: For a primal minimization problem, any dual-feasible $(\lambda,\nu)$ gives a lower bound on the primal optimal value.

Pause point #1 (tiny sanity-check)

Before moving on, check that you can literally compute a dual function once.

Consider the 1D problem:

\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x^2 \\ \text{subject to} \quad & x \ge 1 \end{aligned}

Rewrite the constraint as $g(x)=1-x\le 0$ .

•Lagrangian: $L(x,\lambda)=x^2 + \lambda(1-x)$ with $\lambda\ge 0$ .
•Dual function: $d(\lambda)=\inf_x \big(x^2 + \lambda(1-x)\big)$ .

Try computing $d(\lambda)$ by minimizing over $x$ (complete the square or set derivative to zero). Then compare $\max_{\lambda\ge 0} d(\lambda)$ with the primal optimum $p^*=1$ .

(We’ll do a closely related worked example later, but you should attempt this now—this is the first “feel” of $\inf_x$ .)

The duality gap

Define:

•Primal optimum value: $p^*$
•Dual optimum value: $d^*$

Weak duality implies:

d^* \le p^*.

The difference $p^* - d^*$ is the duality gap.

•If $p^* = d^*$ , duality is tight and we have strong duality.
•If $p^* > d^*$ , there is a positive gap.

Why the dual function is concave (important for optimization)

This may feel counterintuitive: $d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$ , an infimum of functions linear in $(\lambda,\nu)$ . But the infimum of affine functions is concave.

Proof sketch (worth internalizing): for each fixed $x$ , define

\phi_x(\lambda,\nu)=L(x,\lambda,\nu).

As a function of $(\lambda,\nu)$ , $\phi_x$ is affine (linear + constant) because it is $f(x) + \lambda^T g(x) + \nu^T h(x)$ .

Then

d(\lambda,\nu) = \inf_x \phi_x(\lambda,\nu).

The pointwise infimum of affine functions is concave. Concretely, for $0\le t\le 1$ :

\begin{aligned} d(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2)) &= \inf_x \phi_x(t(\lambda_1,\nu_1)+(1-t)(\lambda_2,\nu_2)) \\ &= \inf_x \left(t\phi_x(\lambda_1,\nu_1)+(1-t)\phi_x(\lambda_2,\nu_2)\right) \\ &\ge t\inf_x \phi_x(\lambda_1,\nu_1) + (1-t)\inf_x \phi_x(\lambda_2,\nu_2) \\ &= t d(\lambda_1,\nu_1) + (1-t) d(\lambda_2,\nu_2). \end{aligned}

That inequality step is the key: infimum of a sum is ≥ sum of infima.

Interactive-canvas moment: visualize “lower bound curves”

If your tech tree platform supports an interactive plot, this is the perfect place to attach it.

Suggested interaction:

•Pick a simple primal (like minimizing a quadratic with a linear inequality).
•For each $\lambda\ge 0$ , compute $d(\lambda)$ .
•Plot the curve $d(\lambda)$ and show the horizontal line at $p^*$ .

What you should see:

•For small $\lambda$ , the penalty is weak; the infimum may be far below $p^*$ .
•As $\lambda$ changes, $d(\lambda)$ rises (not necessarily monotonically, but it has a concave “cap” shape).
•At the maximizing $\lambda^*$ , $d(\lambda^*)$ is the best lower bound; in strong-duality cases it meets $p^*$ .

This visualization makes the definition $d(\lambda)=\inf_x L(x,\lambda)$ feel less abstract: you are literally pushing up the best guaranteed lower bound by tuning prices.

Core Mechanic 2 — Strong Duality: When the Lower Bound Becomes Exact

Weak duality always holds, but strong duality is where duality becomes a solving tool, not just a bounding tool.

Convexity is the enabling assumption

Strong duality is most commonly guaranteed in convex optimization problems of the form:

\begin{aligned} \text{minimize}_x \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0, \; i=1,\dots,m \\ & Ax=b \end{aligned}

where:

• $f$ is convex,
•each $g_i$ is convex,
• $Ax=b$ is affine.

(Equality constraints more generally can be affine functions $h_j(x)=0$ ; writing them as $Ax=b$ is a clean special case.)

Convexity matters because it prevents “hidden” nonconvex geometry that can create a duality gap.

Slater’s condition (the standard constraint qualification)

A widely used sufficient condition for strong duality is Slater’s condition:

For convex problems with inequality constraints $g_i(x)\le 0$ and affine equalities, if there exists a point $\tilde{x}$ such that

• $g_i(\tilde{x}) < 0$ for all $i$ (strict feasibility),
•and $A\tilde{x}=b$ ,

then strong duality holds and KKT conditions are necessary and sufficient.

So Slater is an “interior point exists” condition.

Why do we need it? Intuitively:

•If the feasible set has nonempty interior (relative to the equality constraints), multipliers behave nicely.
•If feasible points only exist on the boundary in a fragile way, multipliers may blow up or fail to represent the problem tightly.

What strong duality gives you (operationally)

If strong duality holds:

p^* = d^*.

Moreover, there exist optimal multipliers $(\lambda^*,\nu^*)$ such that:

• $d(\lambda^*,\nu^*) = p^*$
•and a primal optimal solution $x^*$ together with $(\lambda^*,\nu^*)$ satisfy the KKT conditions:
•Primal feasibility
•Dual feasibility ( $\lambda^*\ge 0$ )
•Complementary slackness ( $\lambda_i^* g_i(x^*)=0$ )
•Stationarity ( $\nabla f(x^*) + \sum_i \lambda_i^* \nabla g_i(x^*) + \sum_j \nu_j^* \nabla h_j(x^*) = 0$ )

In other words, duality explains why KKT works: it’s the optimality condition for a saddle point of the Lagrangian.

Saddle-point picture (a unifying mental model)

Think of the Lagrangian as a two-player game:

•Player X chooses $x$ to minimize $L(x,\lambda,\nu)$ .
•Player Λ chooses $(\lambda,\nu)$ (with $\lambda\ge 0$ ) to maximize $L(x,\lambda,\nu)$ .

The primal problem is like:

p^* = \inf_{x \text{ feasible}} f(x).

The dual problem is:

d^* = \sup_{\lambda\ge 0,\nu} \inf_x L(x,\lambda,\nu).

A saddle point $(x^*,\lambda^*,\nu^*)$ satisfies:

L(x^*,\lambda,\nu) \le L(x^*,\lambda^*,\nu^*) \le L(x,\lambda^*,\nu^*)

for all feasible multipliers and all $x$ .

•The right inequality says $x^*$ minimizes the Lagrangian under optimal multipliers.
•The left inequality says optimal multipliers make the Lagrangian as large as possible at $x^*$ .

Strong duality is closely related to the existence of such a saddle point.

Pause point #2 (Slater sanity-check)

Decide whether Slater’s condition holds.

Problem A:

\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x \\ \text{subject to} \quad & x^2 \le 1 \end{aligned}

Is there a strictly feasible point? Yes: $x=0$ gives $0<1$ . So Slater holds.

Problem B:

\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x \\ \text{subject to} \quad & x^2 \le 0 \end{aligned}

Feasible set is only $\{0\}$ . There is no $x$ with $x^2<0$ , so Slater fails.

Take 30 seconds to reflect: both problems are convex, but B has a “boundary-only” feasible set.

Slater failing does not automatically mean strong duality fails, but it removes a key guarantee.

Strong duality is not universal

For nonconvex problems, duality can be loose:

•Dual provides a lower bound (still true).
•But $d^*$ can be strictly less than $p^*$ .

This is why in combinatorial optimization, one often uses duality as a relaxation: solve the dual (or a related convex relaxation) to get bounds even if you can’t solve the primal exactly.

A practical consequence: dual optimality = optimal certificate

When strong duality holds, a dual optimal solution $(\lambda^*,\nu^*)$ is a certificate of optimality:

•If you have a candidate primal feasible $x$ and a dual feasible $(\lambda,\nu)$ such that $f(x)=d(\lambda,\nu)$ , then you have proven global optimality (because $d(\lambda,\nu)\le p^*\le f(x)$ ).

This “sandwich” proof is one of the most useful ways to trust an optimizer.

Application/Connection — Dual Ascent and How Duality Becomes an Algorithm

Duality is not only theory; it directly yields optimization methods.

The dual problem as concave maximization

The dual problem is:

\max_{\lambda\ge 0,\nu} \; d(\lambda,\nu)

with $d$ concave. So we are maximizing a concave function, which is the “nice” direction (equivalently, minimizing a convex function $-d$ ).

But there’s a catch: $d(\lambda,\nu)$ is defined via an infimum over $x$ . It may be nonsmooth.

Computing a subgradient of the dual function

Suppose for given $(\lambda,\nu)$ we can find

x(\lambda,\nu) \in \arg\min_x L(x,\lambda,\nu).

Then a subgradient of $d$ is given by constraint residuals at that minimizing $x$ :

•For inequalities: $\partial_{\lambda_i} d(\lambda,\nu)$ includes $g_i(x(\lambda,\nu))$ .
•For equalities: $\partial_{\nu_j} d(\lambda,\nu)$ includes $h_j(x(\lambda,\nu))$ .

Intuition:

•If a constraint is violated ( $g_i(x)>0$ ), increasing $\lambda_i$ should raise the penalty and thus raise the lower bound.
•If a constraint is satisfied with slack ( $g_i(x)<0$ ), increasing $\lambda_i$ may actually lower $d$ (and the optimizer will keep $\lambda_i$ from growing).

Dual ascent updates

A basic dual ascent / subgradient method:

1)Given $(\lambda^k,\nu^k)$ , compute

x^k \in \arg\min_x L(x,\lambda^k,\nu^k).

2)Update multipliers using residuals:

\lambda^{k+1} = \Pi_{\lambda\ge 0}\big(\lambda^k + \alpha_k\, g(x^k)\big)

\nu^{k+1} = \nu^k + \alpha_k\, h(x^k)

Where:

• $g(x^k)$ is the vector $(g_1(x^k),\dots,g_m(x^k))$ .
• $h(x^k)$ is the vector $(h_1(x^k),\dots,h_p(x^k))$ .
• $\Pi_{\lambda\ge 0}$ projects onto the nonnegative orthant (componentwise max with 0).
• $\alpha_k$ is a step size.

Why projection only for $\lambda$ ? Because dual feasibility requires $\lambda\ge 0$ for inequalities, but $\nu$ is free.

How to choose step sizes (briefly)

Subgradient methods are sensitive to step sizes.

Common choices:

•Constant step size: $\alpha_k=\alpha$ (often gives a neighborhood around optimum).
•Diminishing step size: $\alpha_k \to 0$ with $\sum_k \alpha_k=\infty$ (classic convergence conditions for subgradient methods).

In practice, many systems use more advanced variants (accelerations, adaptive rules) or different splitting methods (e.g., ADMM) that are also rooted in the Lagrangian.

Why dual ascent can be efficient: separability and decomposition

A major payoff occurs when the Lagrangian separates over components of $x$ .

Suppose $x=(x_1,\dots,x_N)$ and

f(x)=\sum_{n=1}^N f_n(x_n)

and constraints couple them only through a simple sum, e.g.

\sum_{n=1}^N A_n x_n = b.

Then

L(x,\nu)=\sum_{n=1}^N f_n(x_n) + \nu^T\left(\sum_{n=1}^N A_n x_n - b\right) = \sum_{n=1}^N \left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.

Now the minimization over $x$ splits:

\inf_x L(x,\nu) = \sum_{n=1}^N \inf_{x_n}\left(f_n(x_n)+\nu^T A_n x_n\right) - \nu^T b.

So computing $d(\nu)$ is $N$ smaller problems, often parallelizable.

This is the conceptual foundation of distributed optimization: multipliers coordinate many local optimizations.

Connecting back to KKT

If strong duality holds and the method converges to $(\lambda^*,\nu^*)$ with corresponding $x^*$ minimizing $L(\cdot,\lambda^*,\nu^*)$ , then KKT conditions emerge:

•Stationarity: $x^*$ minimizes $L$ .
•Complementary slackness: in the limit, multipliers grow only where constraints are tight.
•Feasibility: residuals driven toward 0 by multiplier updates.

This is a clean story:

•KKT describes optimality.
•Dual ascent is one route to satisfy KKT by iteratively tuning multipliers.

A second interactive-canvas moment (algorithm intuition)

If you can animate iterations:

•Show a point $x^k$ moving in primal space (perhaps unconstrained minimizer of $L$ ).
•Show $\lambda^k$ changing over time.
•Overlay the constraint boundary $g(x)=0$ .

What learners should observe:

•If $x^k$ violates the constraint, $\lambda$ increases, “tilting” $L$ so the next minimizer is pushed back toward feasibility.
•If $x^k$ is strictly feasible, $\lambda$ may decrease toward 0.

This makes dual ascent feel like an automatic constraint-enforcement mechanism driven by prices.

Worked Examples (3)

Dual function and dual optimum for a 1D quadratic with an inequality

Primal:

\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & x^2 \\ \text{subject to} \quad & x \ge 1 \end{aligned}

Rewrite $x\ge 1$ as $g(x)=1-x\le 0$ .

Form the Lagrangian (single multiplier):
\n$ $L(x,\lambda)=x^2+\lambda(1-x),\quad \lambda\ge 0.$ $
Compute the dual function:
\n$ $d(\lambda)=\inf_x \left(x^2-\lambda x + \lambda\right).$ $
Minimize the quadratic over x by setting derivative to zero:
\n$ $\frac{d}{dx}\left(x^2-\lambda x + \lambda\right)=2x-\lambda=0 \Rightarrow x^*(\lambda)=\frac{\lambda}{2}.$ $
Plug back in to get the infimum value:
\n$$
\begin{aligned}
d(\lambda)
&= \left(\frac{\lambda}{2}\right)^2 - \lambda\left(\frac{\lambda}{2}\right) + \lambda \\
&= \frac{\lambda^2}{4} - \frac{\lambda^2}{2} + \lambda \\
&= -\frac{\lambda^2}{4}+\lambda.
\end{aligned}
Solve the dual problem:
\n$ $\max_{\lambda\ge 0}\; -\frac{\lambda^2}{4}+\lambda.$ $
This is a concave parabola. Differentiate:
\n$ $\frac{d}{d\lambda}\left(-\frac{\lambda^2}{4}+\lambda\right)=-\frac{\lambda}{2}+1.$ $
Set to zero: $-\lambda/2+1=0 \Rightarrow \lambda^*=2$ (and it satisfies $\lambda\ge 0$ ).
Compute the dual optimum value:
\n$ $d^*=d(2)=-\frac{4}{4}+2=1.$ $
Compute primal optimum for comparison: constraint forces $x\ge 1$ , and $x^2$ is minimized at $x=1$ , so $p^*=1$ .

Insight: Here $d^*=p^*$ , so strong duality holds (the problem is convex and Slater holds since e.g. $x=2$ gives strict feasibility $1-2<0$ ). Also notice the meaning of $\lambda^*=2$ : it is the price that makes the unconstrained minimizer of $L$ land exactly at the boundary $x=1$ (since $x^*(\lambda)=\lambda/2$ ).

Deriving the dual of a quadratic program with an equality constraint

Primal (a simple equality-constrained QP):

\begin{aligned} \text{minimize}_{x\in\mathbb{R}^n} \quad & \frac{1}{2}x^T Q x + c^T x \\ \text{subject to} \quad & Ax=b \end{aligned}

Assume $Q\succ 0$ (positive definite), so the objective is strictly convex and the minimizer is unique for each $\nu$.

Form the Lagrangian (only equality multipliers):
\n$ $L(x,\nu)=\frac{1}{2}x^TQx + c^T x + \nu^T(Ax-b).$ $
Compute the dual function:
\n$ $d(\nu)=\inf_x L(x,\nu).$ $
Minimize over x by stationarity (differentiate w.r.t. x and set to 0):
\n$ $\nabla_x L(x,\nu)=Qx+c+A^T\nu=0.$ $
Solve for the minimizing x as a function of $\nu$:
\n$ $x^*(\nu)=-Q^{-1}(c+A^T\nu).$ $
Plug back into the Lagrangian. First expand:
\n$$
\begin{aligned}
L(x,\nu)
&=\frac{1}{2}x^TQx + (c+A^T\nu)^T x - \nu^T b.
\end{aligned}
Use the standard quadratic minimization identity: for $Q\succ 0$ ,
\n$ $\inf_x \left(\frac{1}{2}x^TQx + q^T x\right) = -\frac{1}{2} q^T Q^{-1} q,$ $
\nattained at $x=-Q^{-1}q$ .
Here $q=(c+A^T\nu)$ .
Therefore:
\n$$
\begin{aligned}
d(\nu)
&= -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.
\end{aligned}
Write the dual problem:
\n$ $\max_{\nu\in\mathbb{R}^p}\; -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.$ $
Optionally expand into a standard concave quadratic form in $\nu$:
\n$$
\begin{aligned}
(c+A^T\nu)^T Q^{-1}(c+A^T\nu)
= c^TQ^{-1}c + 2\nu^T A Q^{-1}c + \nu^T A Q^{-1} A^T \nu.
\end{aligned}
$So \n$$ \begin{aligned} d(\nu)= -\frac{1}{2}\nu^T(AQ^{-1}A^T)\nu - \nu^T(AQ^{-1}c + b) - \frac{1}{2}c^TQ^{-1}c. \end{aligned}$

Insight: This example shows the typical dual pattern: (1) build $L$ , (2) compute $d$ by eliminating $x$ via an infimum, and (3) get a concave maximization over multipliers. The matrix $AQ^{-1}A^T$ governs the dual curvature. When the primal is large but $p$ (number of equalities) is small, the dual can be much cheaper.

One iteration of dual ascent for an inequality-constrained problem

Primal:

\begin{aligned} \text{minimize}_{x\in\mathbb{R}} \quad & (x-3)^2 \\ \text{subject to} \quad & x \le 1 \end{aligned}

Rewrite as $g(x)=x-1\le 0$ . Lagrangian: $L(x,\lambda)=(x-3)^2+\lambda(x-1)$ , $\lambda\ge 0$ .

We’ll run one dual-ascent step with step size $\alpha=0.5$ , starting from $\lambda^0=0$ .

Given $\lambda^0=0$ , minimize $L(x,\lambda^0)$ over x:
\n$ $L(x,0)=(x-3)^2.$ $
Minimizer is $x^0=3$ .
Compute constraint residual at $x^0$ :
\n$ $g(x^0)=x^0-1=2>0,$ $
so the constraint is violated.
Update the multiplier using projected ascent:
\n$ $\lambda^{1}=\max\{0,\lambda^0+\alpha g(x^0)\}=\max\{0,0+0.5\cdot 2\}=1.$ $
Now (to see the effect), compute the next primal minimizer for $\lambda^1=1$ :
\nMinimize $L(x,1)=(x-3)^2 + (x-1)$ .
Differentiate:
\n$ $\frac{d}{dx}\left((x-3)^2+x-1\right)=2(x-3)+1=2x-5.$ $
Set to zero: $2x-5=0 \Rightarrow x^1=2.5$ .
Still infeasible, but closer to the constraint boundary $x\le 1$ than 3.

Insight: Dual ascent increases $\lambda$ when constraints are violated. That increase changes the unconstrained minimizer of $L$ , pushing the next $x$ toward feasibility. With proper stepsizes (and under convexity/regularity), iterating this can drive residuals down and converge to the constrained optimum.

Key Takeaways

✓
The Lagrangian $L(x,\lambda,\nu)=f(x)+\lambda^T g(x)+\nu^T h(x)$ combines objective and constraints using multipliers, with $\lambda\ge 0$ for inequalities.
✓
The dual function $d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$ is always a lower bound on the primal optimum value $p^*$ when $\lambda\ge 0$ (weak duality).
✓
The dual problem $\max_{\lambda\ge 0,\nu} d(\lambda,\nu)$ chooses the tightest lower bound; its value $d^*$ satisfies $d^*\le p^*$ .
✓
Even if the primal is nonconvex, the dual function is concave because it is the pointwise infimum of affine functions in $(\lambda,\nu)$ .
✓
In convex problems, Slater’s condition (existence of a strictly feasible point) is a standard sufficient condition for strong duality and KKT necessity/sufficiency.
✓
When strong duality holds, matching primal/dual values provide a certificate of optimality: $d(\lambda,\nu)\le p^*\le f(x)$ , and equality proves optimality.
✓
Dual ascent (subgradient ascent on $d$ ) updates multipliers using constraint residuals; projection enforces $\lambda\ge 0$ .
✓
Duality becomes especially powerful when $\inf_x L$ decomposes across components of $x$ , enabling parallel and distributed optimization.

Common Mistakes

✗
Forgetting the sign restriction $\lambda\ge 0$ for inequality constraints, which breaks the lower-bound (weak duality) argument.
✗
Confusing $d(\lambda,\nu)=\inf_x L(x,\lambda,\nu)$ with $\min_x f(x)$ or with the constrained minimum— $d$ is an unconstrained infimum over $x$ of the Lagrangian.
✗
Assuming strong duality always holds; without convexity + a constraint qualification (e.g., Slater), the duality gap can be positive.
✗
Running dual ascent without projection on $\lambda$ or with unsuitable step sizes, leading to divergence or oscillation (common with nonsmooth dual functions).

Practice

easy

Compute the dual function for the problem: minimize $x^2$ subject to $x\le 2$ . Then find the dual optimum value and compare to the primal optimum.

Hint: Write the constraint as $g(x)=x-2\le 0$ . Form $L(x,\lambda)=x^2+\lambda(x-2)$ with $\lambda\ge 0$ . Minimize the quadratic in $x$ to get $d(\lambda)$ , then maximize over $\lambda\ge 0$ .

Show solution

Constraint $x\le 2$ means $g(x)=x-2\le 0$ .

Lagrangian: $L(x,\lambda)=x^2+\lambda(x-2)$ .

Dual function:

d(\lambda)=\inf_x (x^2+\lambda x-2\lambda).

Stationarity: $2x+\lambda=0 \Rightarrow x^*(\lambda)=-\lambda/2$ .

Plug in:

\begin{aligned} d(\lambda) &= \left(-\frac{\lambda}{2}\right)^2 + \lambda\left(-\frac{\lambda}{2}\right) - 2\lambda \\ &= \frac{\lambda^2}{4}-\frac{\lambda^2}{2}-2\lambda \\ &= -\frac{\lambda^2}{4}-2\lambda. \end{aligned}

Maximize over $\lambda\ge 0$ . This concave parabola is decreasing for $\lambda\ge 0$ (derivative $-\lambda/2-2<0$ ), so maximum occurs at $\lambda^*=0$ .

Thus $d^*=d(0)=0$ .

Primal optimum: unconstrained minimizer $x=0$ is feasible (since $0\le 2$ ), so $p^*=0$ .

Hence $d^=p^=0 (\text{strong duality holds}).

medium

Slater check: Consider minimizing a convex function $f(x)$ subject to the single constraint $\|x\|_2 \le 1$ (in $\mathbb{R}^n$ ). Does Slater’s condition hold? What if the constraint is $\|x\|_2 \le 0$ ?

Hint: Slater requires a strictly feasible point for inequalities: find an $x$ such that $\|x\|_2 < 1$ (or < 0).

Show solution

For $\|x\|_2 \le 1$ , Slater holds because $x=0$ satisfies $\|0\|_2=0<1$ .

For $\|x\|_2 \le 0$ , the feasible set is only $\{0\}$ and there is no $x$ with $\|x\|_2<0$ . So Slater fails.

hard

Dual ascent reasoning: For a problem with one inequality constraint $g(x)\le 0$ , suppose at iteration k you find $x^k \in \arg\min_x L(x,\lambda^k)$ and you observe $g(x^k)>0$ . What does the update $\lambda^{k+1}=\max\{0,\lambda^k+\alpha g(x^k)\}$ do, and why is that direction sensible for maximizing $d(\lambda)$ ?

Hint: Interpret $g(x^k)$ as a (sub)gradient of $d$ at $\lambda^k$ . Also interpret $\lambda$ as a penalty weight on violations.

Show solution

If $g(x^k)>0$ , the constraint is violated at the current Lagrangian minimizer. The update increases $\lambda$ (since $\lambda^k+\alpha g(x^k)>\lambda^k$ ), making future minimizations of $L(x,\lambda)$ penalize positive $g(x)$ more heavily.

This is sensible because (under mild conditions) $g(x^k)$ is a subgradient of the concave dual function $d(\lambda)=\inf_x \big(f(x)+\lambda g(x)\big)$ . For concave maximization, ascending along a subgradient increases $d$ locally. The projection $\max\{0,\cdot\}$ preserves dual feasibility $\lambda\ge 0$ , which is required for weak duality and the correctness of the dual problem.

Connections

Prerequisites you already have: KKT Conditions, Lagrange Multipliers

Next nodes this supports (typical unlocks):

Quality: B (4.0/5)

← back to tree browse all →

Lagrangian Duality

Interactive Visualization

Core Concepts

Key Symbols & Notation

Essential Relationships

Prerequisites (2)

Graph Position

Cognitive Load

All Concepts (13)

Teaching Strategy

What Is Lagrangian Duality?

The primal problem (what we start with)

The Lagrangian (the key scalar object)

The dual function: best value of the priced problem

The dual problem: maximize the best lower bound

Why duality is worth learning

Core Mechanic 1 — The Dual Function as an Infimum (and Why It Lower-Bounds the Primal)

Step 1: Fix multipliers and look at L(x,λ,ν)L(x,\lambda,\nu)L(x,λ,ν)

Step 2: For feasible xxx, the Lagrangian never exceeds the objective

Step 3: Take the best (lowest) Lagrangian value over all xxx

Pause point #1 (tiny sanity-check)

The duality gap

Why the dual function is concave (important for optimization)

Interactive-canvas moment: visualize “lower bound curves”

Core Mechanic 2 — Strong Duality: When the Lower Bound Becomes Exact

Convexity is the enabling assumption

Slater’s condition (the standard constraint qualification)

What strong duality gives you (operationally)

Saddle-point picture (a unifying mental model)

Pause point #2 (Slater sanity-check)

Strong duality is not universal

A practical consequence: dual optimality = optimal certificate

Application/Connection — Dual Ascent and How Duality Becomes an Algorithm

The dual problem as concave maximization

Computing a subgradient of the dual function

Dual ascent updates

How to choose step sizes (briefly)

Why dual ascent can be efficient: separability and decomposition

Connecting back to KKT

A second interactive-canvas moment (algorithm intuition)

Worked Examples (3)

Dual function and dual optimum for a 1D quadratic with an inequality

Deriving the dual of a quadratic program with an equality constraint

One iteration of dual ascent for an inequality-constrained problem

Key Takeaways

Common Mistakes

Practice

Connections

Step 1: Fix multipliers and look at $L(x,\lambda,\nu)$

Step 2: For feasible $x$ , the Lagrangian never exceeds the objective

Step 3: Take the best (lowest) Lagrangian value over all $x$