Dual problems, weak and strong duality. Dual ascent.
Multi-session curriculum - substantial prior knowledge and complex material. Use mastery gates and deliberate practice.
Constrained optimization feels like juggling: you want to minimize an objective while never dropping the constraints. Lagrangian duality turns that juggling act into a different game—choose prices (multipliers) for constraint violation so that the best “priced” solution automatically respects the constraints. The surprising part: those prices also generate guaranteed lower bounds on the true optimum, and in many important cases the bound becomes exact.
Given a primal minimization problem with constraints, form the Lagrangian with . The dual function is always a lower bound on the primal optimum (weak duality). The dual problem maximizes this bound over multipliers. Under conditions like Slater’s condition for convex problems, the bound is tight (strong duality). Dual ascent updates multipliers to improve the bound and can solve large problems by exploiting separability and the structure of .
A standard constrained minimization problem (the primal) is:
Here:
You already know KKT conditions and Lagrange multipliers. Duality builds on those ideas but changes the emphasis: instead of directly enforcing constraints, we price violations.
Define the Lagrangian:
Intuition:
This mirrors a familiar idea: “soft constraints” with penalties. But duality is more precise: the penalties are variables you optimize.
Given multipliers , consider minimizing the Lagrangian over :
This is called the dual function.
Why is it an infimum (not necessarily a minimum)? Because for some multipliers the Lagrangian might not attain a minimizer, or might go to .
Two important facts (we will prove them carefully soon):
If each gives a lower bound, we should pick the best one:
This is the Lagrange dual problem. Its optimal value is denoted .
Duality is not just “another way to write the same thing.” It gives:
We’ll proceed slowly: first we’ll make the “lower bound” statement completely concrete; then we’ll talk about when the bound becomes tight; then we’ll turn that into an algorithm.
For a fixed with , the Lagrangian is just a scalar function of :
Now imagine two categories of :
For feasible , we can compare and .
If is feasible, then:
Therefore:
This inequality is the heart of weak duality.
By definition,
The infimum over all is certainly no larger than the value at any particular feasible :
Now take the infimum over all feasible on the right-hand side. The best feasible objective value is the primal optimum value :
This is weak duality.
Weak duality: For a primal minimization problem, any dual-feasible gives a lower bound on the primal optimal value.
Before moving on, check that you can literally compute a dual function once.
Consider the 1D problem:
Rewrite the constraint as .
Try computing by minimizing over (complete the square or set derivative to zero). Then compare with the primal optimum .
(We’ll do a closely related worked example later, but you should attempt this now—this is the first “feel” of .)
Define:
Weak duality implies:
The difference is the duality gap.
This may feel counterintuitive: , an infimum of functions linear in . But the infimum of affine functions is concave.
Proof sketch (worth internalizing): for each fixed , define
As a function of , is affine (linear + constant) because it is .
Then
The pointwise infimum of affine functions is concave. Concretely, for :
That inequality step is the key: infimum of a sum is ≥ sum of infima.
If your tech tree platform supports an interactive plot, this is the perfect place to attach it.
Suggested interaction:
What you should see:
This visualization makes the definition feel less abstract: you are literally pushing up the best guaranteed lower bound by tuning prices.
Weak duality always holds, but strong duality is where duality becomes a solving tool, not just a bounding tool.
Strong duality is most commonly guaranteed in convex optimization problems of the form:
where:
(Equality constraints more generally can be affine functions ; writing them as is a clean special case.)
Convexity matters because it prevents “hidden” nonconvex geometry that can create a duality gap.
A widely used sufficient condition for strong duality is Slater’s condition:
For convex problems with inequality constraints and affine equalities, if there exists a point such that
then strong duality holds and KKT conditions are necessary and sufficient.
So Slater is an “interior point exists” condition.
Why do we need it? Intuitively:
If strong duality holds:
Moreover, there exist optimal multipliers such that:
In other words, duality explains why KKT works: it’s the optimality condition for a saddle point of the Lagrangian.
Think of the Lagrangian as a two-player game:
The primal problem is like:
The dual problem is:
A saddle point satisfies:
for all feasible multipliers and all .
Strong duality is closely related to the existence of such a saddle point.
Decide whether Slater’s condition holds.
Problem A:
Is there a strictly feasible point? Yes: gives . So Slater holds.
Problem B:
Feasible set is only . There is no with , so Slater fails.
Take 30 seconds to reflect: both problems are convex, but B has a “boundary-only” feasible set.
Slater failing does not automatically mean strong duality fails, but it removes a key guarantee.
For nonconvex problems, duality can be loose:
This is why in combinatorial optimization, one often uses duality as a relaxation: solve the dual (or a related convex relaxation) to get bounds even if you can’t solve the primal exactly.
When strong duality holds, a dual optimal solution is a certificate of optimality:
This “sandwich” proof is one of the most useful ways to trust an optimizer.
Duality is not only theory; it directly yields optimization methods.
The dual problem is:
with concave. So we are maximizing a concave function, which is the “nice” direction (equivalently, minimizing a convex function ).
But there’s a catch: is defined via an infimum over . It may be nonsmooth.
Suppose for given we can find
Then a subgradient of is given by constraint residuals at that minimizing :
Intuition:
A basic dual ascent / subgradient method:
Where:
Why projection only for ? Because dual feasibility requires for inequalities, but is free.
Subgradient methods are sensitive to step sizes.
Common choices:
In practice, many systems use more advanced variants (accelerations, adaptive rules) or different splitting methods (e.g., ADMM) that are also rooted in the Lagrangian.
A major payoff occurs when the Lagrangian separates over components of .
Suppose and
and constraints couple them only through a simple sum, e.g.
Then
Now the minimization over splits:
So computing is smaller problems, often parallelizable.
This is the conceptual foundation of distributed optimization: multipliers coordinate many local optimizations.
If strong duality holds and the method converges to with corresponding minimizing , then KKT conditions emerge:
This is a clean story:
If you can animate iterations:
What learners should observe:
This makes dual ascent feel like an automatic constraint-enforcement mechanism driven by prices.
Primal:
Rewrite as .
Form the Lagrangian (single multiplier):
\n$$
Compute the dual function:
\n$$
Minimize the quadratic over x by setting derivative to zero:
\n$$
Plug back in to get the infimum value:
\n$$
\begin{aligned}
d(\lambda)
&= \left(\frac{\lambda}{2}\right)^2 - \lambda\left(\frac{\lambda}{2}\right) + \lambda \\
&= \frac{\lambda^2}{4} - \frac{\lambda^2}{2} + \lambda \\
&= -\frac{\lambda^2}{4}+\lambda.
\end{aligned}
Solve the dual problem:
\n$$
This is a concave parabola. Differentiate:
\n$$
Set to zero: (and it satisfies ).
Compute the dual optimum value:
\n$$
Compute primal optimum for comparison: constraint forces , and is minimized at , so .
Insight: Here , so strong duality holds (the problem is convex and Slater holds since e.g. gives strict feasibility ). Also notice the meaning of : it is the price that makes the unconstrained minimizer of land exactly at the boundary (since ).
Primal (a simple equality-constrained QP):
Assume (positive definite), so the objective is strictly convex and the minimizer is unique for each \(\nu\).
Form the Lagrangian (only equality multipliers):
\n$$
Compute the dual function:
\n$$
Minimize over x by stationarity (differentiate w.r.t. x and set to 0):
\n$$
Solve for the minimizing x as a function of \(\nu\):
\n$$
Plug back into the Lagrangian. First expand:
\n$$
\begin{aligned}
L(x,\nu)
&=\frac{1}{2}x^TQx + (c+A^T\nu)^T x - \nu^T b.
\end{aligned}
Use the standard quadratic minimization identity: for ,
\n$$
\nattained at .
Here .
Therefore:
\n$$
\begin{aligned}
d(\nu)
&= -\frac{1}{2}(c+A^T\nu)^T Q^{-1}(c+A^T\nu) - \nu^T b.
\end{aligned}
Write the dual problem:
\n$$
Optionally expand into a standard concave quadratic form in \(\nu\):
\n$$
\begin{aligned}
(c+A^T\nu)^T Q^{-1}(c+A^T\nu)
= c^TQ^{-1}c + 2\nu^T A Q^{-1}c + \nu^T A Q^{-1} A^T \nu.
\end{aligned}
Insight: This example shows the typical dual pattern: (1) build , (2) compute by eliminating via an infimum, and (3) get a concave maximization over multipliers. The matrix governs the dual curvature. When the primal is large but (number of equalities) is small, the dual can be much cheaper.
Primal:
Rewrite as . Lagrangian: , .
We’ll run one dual-ascent step with step size , starting from .
Given , minimize over x:
\n$$
Minimizer is .
Compute constraint residual at :
\n$$
so the constraint is violated.
Update the multiplier using projected ascent:
\n$$
Now (to see the effect), compute the next primal minimizer for :
\nMinimize .
Differentiate:
\n$$
Set to zero: .
Still infeasible, but closer to the constraint boundary than 3.
Insight: Dual ascent increases when constraints are violated. That increase changes the unconstrained minimizer of , pushing the next toward feasibility. With proper stepsizes (and under convexity/regularity), iterating this can drive residuals down and converge to the constrained optimum.
The Lagrangian combines objective and constraints using multipliers, with for inequalities.
The dual function is always a lower bound on the primal optimum value when (weak duality).
The dual problem chooses the tightest lower bound; its value satisfies .
Even if the primal is nonconvex, the dual function is concave because it is the pointwise infimum of affine functions in .
In convex problems, Slater’s condition (existence of a strictly feasible point) is a standard sufficient condition for strong duality and KKT necessity/sufficiency.
When strong duality holds, matching primal/dual values provide a certificate of optimality: , and equality proves optimality.
Dual ascent (subgradient ascent on ) updates multipliers using constraint residuals; projection enforces .
Duality becomes especially powerful when decomposes across components of , enabling parallel and distributed optimization.
Forgetting the sign restriction for inequality constraints, which breaks the lower-bound (weak duality) argument.
Confusing with or with the constrained minimum— is an unconstrained infimum over of the Lagrangian.
Assuming strong duality always holds; without convexity + a constraint qualification (e.g., Slater), the duality gap can be positive.
Running dual ascent without projection on or with unsuitable step sizes, leading to divergence or oscillation (common with nonsmooth dual functions).
Compute the dual function for the problem: minimize subject to . Then find the dual optimum value and compare to the primal optimum.
Hint: Write the constraint as . Form with . Minimize the quadratic in to get , then maximize over .
Constraint means .
Lagrangian: .
Dual function:
Stationarity: .
Plug in:
Maximize over . This concave parabola is decreasing for (derivative ), so maximum occurs at .
Thus .
Primal optimum: unconstrained minimizer is feasible (since ), so .
Hence $d^=p^=0 (\text{strong duality holds}).
Slater check: Consider minimizing a convex function subject to the single constraint (in ). Does Slater’s condition hold? What if the constraint is ?
Hint: Slater requires a strictly feasible point for inequalities: find an such that (or < 0).
For , Slater holds because satisfies .
For , the feasible set is only and there is no with . So Slater fails.
Dual ascent reasoning: For a problem with one inequality constraint , suppose at iteration k you find and you observe . What does the update do, and why is that direction sensible for maximizing ?
Hint: Interpret as a (sub)gradient of at . Also interpret as a penalty weight on violations.
If , the constraint is violated at the current Lagrangian minimizer. The update increases (since ), making future minimizations of penalize positive more heavily.
This is sensible because (under mild conditions) is a subgradient of the concave dual function . For concave maximization, ascending along a subgradient increases locally. The projection preserves dual feasibility , which is required for weak duality and the correctness of the dual problem.
Prerequisites you already have: KKT Conditions, Lagrange Multipliers
Next nodes this supports (typical unlocks):