First, second, third degree price discrimination. Two-part tariffs, bundling, versioning. Welfare effects by degree.
Firms from airlines to software vendors routinely charge different customers different prices — understanding how and why lets you predict firm behavior, welfare consequences, and regulatory trade-offs.
Price discrimination is the set of techniques a firm with market power uses to charge different prices to different buyers (or for different quantities/qualities) to increase profit; the lesson shows the mechanics of first-, second-, and third-degree discrimination, two-part tariffs, bundling and versioning, and their welfare effects.
Definition and motivation
Price discrimination is any pricing policy that charges different prices to different buyers for essentially the same good or service when differences are not driven by cost differences. For price discrimination to be feasible the firm needs: (i) market power (ability to set price above marginal cost), (ii) the ability to segment markets or observe signals correlated with willingness to pay, and (iii) limited arbitrage (customers cannot easily resell or arbitrage away price differences).
Degrees of price discrimination (core taxonomy)
Core intuition in 1 graph/formula
Start from a standard monopoly demand: . In "In Profit Maximization, we learned" that monopoly chooses output where marginal revenue equals marginal cost: . Solve for monopoly quantity and price:
Concrete numeric example: let , , . Then
Consumer surplus (see "In Consumer Surplus, we learned") is the area between demand and price: here
Why discriminate? Because different customers have different maximum willingness to pay. If the monopolist could charge each consumer her exact willingness to pay (first-degree), output expands to the efficient level where , eliminating deadweight loss and capturing all surplus as profit. In practice, firms use weaker devices (second- and third-degree) to harvest part of that surplus.
When thinking about welfare: price discrimination reallocates surplus from consumers to producers. It can increase total welfare relative to single-price monopoly when discrimination leads to higher aggregate output (reducing deadweight loss), but it can reduce welfare if it lowers output in some segments even while increasing profit.
First-degree (perfect) price discrimination (PD)
Mechanics: with perfect information the firm charges each infinitesimal unit at the consumer's marginal willingness to pay, so output is chosen where . Using , efficiency requires
Compare to single-price monopoly . The first-degree monopolist doubles output relative to single-price monopoly for a linear demand and captures the entire area under the demand curve up to as profit.
Numeric example (continuing earlier): gives
Total welfare under first-degree PD equals the competitive-efficient total surplus: producer profit equals the entire sum of areas under the demand curve minus total cost; consumer surplus is zero.
Third-degree PD (observable groups)
Suppose the monopolist serves two segments 1 and 2 with inverse demands
and common marginal cost . The firm maximizes profit
First-order conditions (see "In Profit Maximization, we learned") set marginal revenue in each market equal to marginal cost:
Therefore
Concrete numeric example: take (high-value market), (low-value market), and . Then
Compare single-price monopoly: aggregate demand is only if markets identical. With different intercepts we find the single optimal price by aggregating demand horizontally at each price. But commonly third-degree PD produces higher profit because the firm charges higher price where willingness-to-pay is higher.
Markup rule and elasticity
A compact result: the Lerner index generalizes to each segment:
where is the price elasticity of demand in market evaluated at . This follows from . Example numeric: if at the chosen price, then the markup is . If and elasticity -2, then .
Welfare comparisons
A useful rule: when demands across segments are differently elastic, segmenting allows the monopolist to price closer to MC in elastic segments (in absolute terms, smaller markup) and farther above MC in inelastic segments; this can raise total quantity if previously demand in elastic segment was suppressed by a high single price. Always compute aggregate output to compare DWL explicitly.
Concrete welfare calculation (from earlier numeric example): compute CS and PS in each market by area formulas and sum. If the segmented output sum exceeds the single-price output, total DWL falls relative to single-price monopoly; otherwise it may rise.
Second-degree PD (menues and self-selection)
Second-degree discrimination offers a menu of nonlinear prices so consumers self-select according to private type (willingness to pay). Typical devices: quantity discounts, block pricing, versioning, and two-part tariffs. To analyze, use mechanism design constraints: incentive compatibility (IC) and individual rationality (IR).
Two-part tariffs (TPT)
A two-part tariff is a price schedule consisting of a fixed fee plus a per-unit price : a consumer buying pays . For identical consumers with downward-sloping demand and zero marginal cost, the profit-maximizing two-part tariff sets to get efficient consumption and equal to the consumer surplus at that price to extract surplus entirely. This relies on "In Profit Maximization, we learned" that marginal price should be set by marginal conditions; and "In Consumer Surplus, we learned" how to compute the consumer surplus to set .
Concrete numeric example: single consumer with inverse demand , . Set so the consumer consumes . Consumer surplus is
A monopolist sets and obtains profit (and the per-unit margin is zero). Consumer surplus is zero; total output is efficient. Against heterogeneous consumers, one cannot extract full surplus from all types with a single ; the monopolist trades off participation of low-value types.
Example with two discrete types (numerical): suppose two consumers each have inverse demands , , and each consumer buys individually. Let . If the monopolist sets , then , and their consumer surpluses are , . To sell to both, the maximum uniform fee is , so profit . Alternatively, exclude the low type and charge to the high type, profit . So the firm may prefer to price so that only high-type participates. This illustrates the trade-off in two-part tariffs with heterogeneity.
Versioning and quality differentiation
Versioning sells multiple versions (qualities) of a product at different prices intended to segment consumers by willingness to pay. The canonical analysis (Mussa–Rosen) models consumers by a one-dimensional willingness-to-pay parameter and quality . Utility for a type buying quality at price is . Cost is (often linear in ). The monopolist offers a menu to maximize profits subject to IC and IR constraints. A key result: the high-type gets an efficient quality choice (no distortion) while lower types are allocated a quality distorted downward to loosen IC constraints and extract surplus. This is a direct application of constrained profit maximization taught in "In Profit Maximization, we learned".
Concrete numeric illustration (sketch): Suppose two types: low , high , cost , proportion of high types . The efficient quality for type solves where , so both types would want positive quality if . But the optimal contract solves a constrained optimization and typically yields lower than efficient because the monopolist must prevent high types from pretending to be low.
Bundling
Bundling is selling two (or more) goods together at a package price. Bundling is attractive when consumer valuations for goods are negatively correlated or when individual valuations have high variance: bundling reduces the variance of the total valuation across consumers, allowing a single package price to extract more surplus. For independent uniformly distributed valuations, pure bundling can raise profit versus separate pricing. Concrete numeric example: two goods A and B; consumers have independent uniform valuations on [0,100]. If sold separately, profit per good at monopoly price 50 yields expected profit 50*Pr(v>50) etc. Bundling sets package price near combined mean (100) and captures more of the mass above that price. Always compute expected revenue to compare.
Welfare implications of second-degree devices
Real-world applications (concrete examples)
Policy and regulation implications
Antitrust and consumer protection: regulators are concerned when PD is used to price discriminate against protected groups or to engage in price squeezing and exclusion. Price parity clauses (e.g., hotels and online platforms) restrict a platform’s ability to employ different prices on different channels. First-degree PD often raises distributional concerns despite improving allocative efficiency (it eliminates consumer surplus), so policy weighs equity versus efficiency.
Welfare nuances to remember
Downstream connections (what this enables you to analyze next)
Mastering price discrimination is essential for advanced topics in industrial organization and mechanism design. Specific forward-looking areas that build directly on this lesson:
Link back to prerequisites
This lesson builds on "In Profit Maximization, we learned" the necessary MR=MC first-order conditions and constrained optimization logic for IC and IR constraints. It also uses "In Consumer Surplus, we learned" how to compute and interpret areas under demand curves to evaluate welfare changes, consumer surplus, and deadweight loss. When doing numerical exercises below, explicitly solve MR=MC and compute CS areas as you learned in those prerequisites.
A monopolist serves two separate markets with inverse linear demands P1(Q1)=80-Q1 and P2(Q2)=60-Q2. Marginal cost is MC=10. Compute the profit-maximizing prices and quantities under third-degree discrimination. Then compute the monopoly single-price solution (firm must charge same price in both markets) and compare profits and total output.
For third-degree PD, set MR_i=MC in each market. For linear inverse demand P_i(Q_i)=a_i-Q_i, marginal revenue MR_i=a_i-2Q_i. (Reference: In Profit Maximization, we learned MR=MC.)
Market 1: a1=80, so MR1=80-2Q1. Set MR1=MC => 80-2Q1=10 => Q1=(80-10)/2=35. Price P1=80-35=45.
Market 2: a2=60, so MR2=60-2Q2. Set MR2=MC => 60-2Q2=10 => Q2=(60-10)/2=25. Price P2=60-25=35.
Compute profits: Profit = (P1-MC)Q1 + (P2-MC)Q2 = (45-10)35 + (35-10)25 = 3535 + 2525 = 1225 + 625 = 1850.
Compute total output under PD: Q_PD = Q1+Q2 = 35 + 25 = 60.
Now compute single-price monopoly: aggregate demand at price p is Q(p)=Q1(p)+Q2(p) where Q1(p)=80-p and Q2(p)=60-p, so Q(p)=140 - 2p => invert to get P(Q)=70 - Q/2. Then MR(Q)=70 - Q. Set MR=MC: 70 - Q = 10 => Q_single = 60. Price P_single = 70 - 60/2 = 70 - 30 = 40.
Single-price profit: (P_single - MC)Q_single = (40-10)60 = 30*60 = 1800.
Comparison: Third-degree PD profit = 1850 > Single-price profit = 1800. Total outputs are equal (both 60) in this example; welfare differences come from redistribution: PD charges higher price in market 1 (45) and lower in market 2 (35), but aggregate output stays the same here because demands are linear and intercepts sum in this specific way.
Insight: This example shows the mechanics: solve MR_i=MC for each market (use Profit Maximization). It also illustrates that PD need not change aggregate output in every case; relative intercepts and slopes determine whether total output changes (and hence whether DWL changes). Finally, PD increased profit here because the monopolist exploited heterogeneity.
There are two consumers (H and L) each with inverse demand P_i(q)=a_i-q, with a_H=120, a_L=60, common marginal cost MC=0. The monopolist can offer a two-part tariff (F,p). Find the optimal uniform per-unit price p and fixed fee F if the monopolist wants to sell to both consumers. Determine whether it is ever optimal to exclude the low type.
Under a two-part tariff, consumers choose q solving P_i(q)=p => q_i=a_i-p. For MC=0, profit per unit is (p-0), plus fixed fees.
As "In Profit Maximization, we learned", if consumers are identical, the profit-maximizing per-unit price is p=MC=0 to achieve efficient quantity. But here types differ. Consider p=0 first: then q_H=120, q_L=60.
Compute consumer surplus at p=0 for each type: CS_i = 1/2 q_i (a_i - p) = 1/2 (a_i) (a_i) = a_i^2/2. So CS_H = 120^2/2 = 7200; CS_L = 60^2/2 = 1800.
If the monopolist sets a uniform fee F and p=0 and wants both to participate, F cannot exceed CS_L=1800. Profit = 2F + p(q_H+q_L) - cost = 21800 + 0 = 3600.
Consider excluding the low type: set p=0 and charge F=7200 to the high type only. Profit = 7200, which is greater than 3600. So exclusion dominates selling to both at p=0.
Consider alternatives with p>0: raising p reduces per-unit consumption but allows higher fees; but with linear demands and MC=0, the usual result is that for heterogeneous consumers a monopolist may prefer to set p>MC to screen types or exclude some. Checking p=30: q_H=90,q_L=30; compute CSs and max feasible F to include both, then compute profit and compare (exercise). In our numbers, exclusion of the low type at p=0 dominates selling to both.
Conclusion: the monopolist prefers to target the high-type alone with a high fixed fee rather than sell to both at p=MC in order to extract surplus when types are discrete and sufficiently heterogeneous.
Insight: This example demonstrates how two-part tariffs interact with heterogeneity: while p=MC is efficient in quantity, lump-sum extraction and participation constraints can lead the firm to exclude low-value consumers to maximize profit. It shows the direct application of consumer surplus calculations from "In Consumer Surplus, we learned".
A monopolist sells a product that can be set to quality q at marginal cost c per unit of quality (cost C(q)=cq). There are two consumer types with utilities U=θq-p: type L has θ_L=6, type H has θ_H=10. Proportion of H is 1/2. c=2. Design qualities and prices (q_L,p_L) and (q_H,p_H) to maximize profit subject to incentive compatibility and individual rationality. Show qualitative results and compute the optimal q_L and q_H in the binding-IC case.
Write the firm’s expected profit as 0.5[(p_H - c q_H) + (p_L - c q_L)]. Constraints: (i) IR for low: θ_L q_L - p_L >= 0; (ii) IR for high: θ_H q_H - p_H >= 0; (iii) IC high: θ_H q_H - p_H >= θ_H q_L - p_L; (iv) IC low: θ_L q_L - p_L >= θ_L q_H - p_H. Typically IC low is slack; main binding constraints are IC high and IR low.
Standard trick: eliminate prices using the binding constraints. Suppose IC high binds and IR low binds. From IR low: p_L = θ_L q_L. From IC high binding: θ_H q_H - p_H = θ_H q_L - p_L => p_H = θ_H q_H - θ_H q_L + p_L = θ_H(q_H - q_L) + θ_L q_L = θ_H q_H - (θ_H - θ_L) q_L.
Substitute p_H and p_L into profit: π = 0.5[ p_H - c q_H + p_L - c q_L ] = 0.5[ θ_H q_H - (θ_H - θ_L) q_L - c q_H + θ_L q_L - c q_L ] = 0.5[ (θ_H - c) q_H + ( - (θ_H - θ_L) + θ_L - c ) q_L ]
Simplify coefficient of q_L: - (θ_H - θ_L) + θ_L - c = -θ_H + 2θ_L - c. Plug numbers θ_H=10, θ_L=6, c=2 gives coefficient for q_H: (10-2)=8, coefficient for q_L: -10 + 12 - 2 = 0. So profit simplifies to π = 0.5(8 q_H + 0q_L) = 4 q_H. Thus profit depends only on q_H when the two constraints bind with these parameters.
Optimal q_H maximizes firm’s profit subject to IC/IR manipulation: since no direct cost of increasing q_H beyond constraints except marginal cost enters, set derivative of profit w.r.t q_H equal to zero or check inside constraints. However, the efficient q_H solves θ_H = c => 10=2, which is false; so we choose q_H as high as feasible given consumer IR: p_H = θ_H q_H - (θ_H - θ_L) q_L; but q_L determined by IR low p_L = θ_L q_L and q_L must be nonnegative. With our numerical coefficient for q_L zero, the firm will set q_L at minimal viable level (often zero) and maximize q_H subject to IC and IR that keep high type participating.
A more careful continuous optimization (beyond sketch) yields the standard Mussa–Rosen result: q_H is efficient (or nearly so) and q_L is downward distorted relative to the first-best. Numerically, we find q_L set low (possibly zero) and q_H set where marginal willingness-to-pay equals marginal cost adjusted for information rents. The key qualitative takeaway is distortion downwards for low type.
Insight: This worked example sketches how incentive compatibility and individual rationality create distortions in versioning. Even with only two types, the low-quality product is typically deliberately starved of quality to prevent high-type mimicry; this is a central constrained-optimization result building on "In Profit Maximization, we learned" about constrained optima.
Price discrimination requires (i) market power, (ii) ability to segment or signal willingness to pay, and (iii) limited arbitrage.
First-degree PD achieves allocative efficiency (P=MC) and extracts all consumer surplus; it increases total output relative to single-price monopoly.
Third-degree PD sets MR_i=MC in each observable segment and yields markups according to the Lerner rule: (P-MC)/P=-1/ε; welfare effects relative to single-price monopoly are ambiguous and depend on output changes across segments.
Second-degree PD (menus, two-part tariffs, versioning) uses self-selection; two-part tariffs can implement efficiency at the margin (p=MC) but lump-sum fees redistribute surplus; heterogeneity can lead to exclusion.
Versioning solves a constrained optimization (IC and IR): high types often receive nearly efficient quality, low types are distorted downward to prevent imitation.
Bundling can increase a monopolist's revenue by reducing valuation variance (helpful when valuations are negatively correlated or have high variance), but its welfare effects depend on how it changes participation and total quantity.
Always compute MR=MC and consumer surplus (areas) explicitly — the prerequisites "In Profit Maximization, we learned" and "In Consumer Surplus, we learned" are the operational basis for all calculations.
Confusing degrees: treating second-degree mechanisms (menus) as equivalent to third-degree segmentation. Why wrong: second-degree relies on self-selection under private information and requires IC constraints; third-degree uses observed group identifiers and sets simple group prices.
Assuming price discrimination always reduces welfare. Why wrong: first-degree PD is allocatively efficient and can eliminate deadweight loss; third-degree PD can increase output in some segments and reduce DWL relative to single-price monopoly.
Forgetting individual rationality in two-part tariffs and versioning. Why wrong: lump-sum fees cannot exceed a consumer's surplus at the chosen marginal price without causing exit, so ignoring IR yields infeasible contracts.
Setting per-unit price p>MC automatically in two-part tariffs. Why wrong: for identical consumers p=MC is profit-maximizing in quantity terms; p>MC can be used for screening heterogeneous types but trades off efficiency for extractable surplus.
Easy — Third-degree pricing: A monopolist serves two markets with demands Q1=100-2P1 and Q2=80-4P2. Marginal cost MC=10. Find the profit-maximizing prices P1 and P2 under third-degree discrimination.
Hint: Convert to inverse demand P_i(Q_i), compute MR_i and set MR_i=MC for each market.
Inverse demands: P1=50-0.5Q1, P2=20-0.25Q2. MR1=50-Q1, MR2=20-0.5Q2. Set MR1=MC => 50-Q1=10 => Q1=40 => P1=50-0.540=30. MR2=MC => 20-0.5Q2=10 => Q2=20 => P2=20-0.2520=15.
Medium — Two-part tariff with three consumers: Three consumers have willingness-to-pay intercepts a={120,80,40} with identical unit slope b=1 so each has inverse demand P_i(q)=a_i - q. MC=0. The monopolist can set (p,F) uniform across consumers. What p and F maximize profit if the firm wants to sell to the two highest types but exclude the lowest? Compute resulting profit.
Hint: To sell to types with a_i, set p so that each chosen type buys q_i=a_i - p. Fee F is at most the smaller consumer surplus of the two targeted types. Consider p=MC=0 as candidate and also p>0 to screen the low type.
If p=0, q_1=120,q_2=80, CSs are 7200 and 3200; to sell to both F≤3200 so profit=23200=6400. If choose p=20, q_1=100,q_2=60, CSs: CS1=0.5100100=5000, CS2=0.56060=1800 => F≤1800 => profit fees=3600 plus per-unit revenue p(q1+q2)=20*160=3200 => total=6800. Thus p=20, F=1800 yields profit 6800 > 6400. So optimal picks p>0 to reduce low-type surplus and extract more from both types.
Hard — Versioning with continuum of types: Consumers have type θ uniformly distributed on [0,1]. Utility from quality q at price p is U=θq - p. Cost of quality per consumer is C(q)=k q where k>0. The seller can offer a continuum menu (q(θ),p(θ)). Derive the first-order condition for the optimal quality schedule q(θ) using the Mirrlees approach (virtual surplus), show whether quality is distorted for low types, and state the condition for no distortion.
Hint: Use the revelation principle to restrict to truthful direct mechanisms. The seller’s expected profit is integral of (p(θ)-C(q(θ)))f(θ)dθ subject to IC (which implies envelope theorem: U'(θ)=q(θ)) and IR. Use integration by parts to write profit as expected virtual surplus: maximize ∫[θ q(θ) - C(q(θ)) - (1-F(θ))/f(θ) q(θ)] f(θ)dθ.
Under truth-telling, the envelope theorem yields consumer utility U(θ)=U(0)+∫_0^θ q(t) dt. With U(0) pinned by IR, the seller sets prices from p(θ)=θ q(θ) - U(θ). Expected profit is ∫[p(θ)-C(q(θ))] f(θ)dθ = ∫[θ q(θ) - U(θ) - C(q(θ))] f(θ)dθ. Integrating by parts and simplifying yields profit expressed as ∫[ (θ - (1-F(θ))/f(θ)) q(θ) - C(q(θ)) ] f(θ) dθ + constant. The term φ(θ)=θ - (1-F(θ))/f(θ) is the virtual valuation. The pointwise first-order condition for q(θ) is φ(θ) = C'(q(θ)). For uniform [0,1], f=1, F=θ so φ(θ)=θ - (1-θ)/1 = 2θ -1. So q(θ) satisfies 2θ -1 = k => q(θ) = (2θ -1)/k if positive, zero otherwise. Thus low types with θ < 1/2 produce negative φ(θ) and hence q(θ)=0 (full distortion to zero); types above threshold receive positive quality increasing in θ. No distortion (i.e., q(θ) equal to first-best where θ=C'(q) ) occurs only if φ(θ)=θ for all θ which requires (1-F)/f = 0, not true for a continuous distribution. So distortion is generic: low types receive lower quality than first best; only when the informational rent term (1-F)/f vanishes, e.g. degenerate distribution, would there be no distortion.
This lesson builds directly on two prerequisites. "In Profit Maximization, we learned" to set marginal revenue equal to marginal cost and to solve constrained optimization problems; those tools are used repeatedly (MR=MC in each segment for third-degree, FOC derivations for two-part tariffs and menus, and constrained maximization with IC/IR in versioning). "In Consumer Surplus, we learned" how to compute areas under demand curves and evaluate deadweight loss and surplus transfers; computing CS is essential for setting lump-sum fees and evaluating welfare impacts. Looking forward, mastering price discrimination enables correct reasoning in mechanism design (optimal auctions and menus), dynamic/personalized pricing (where firms approach first-degree PD via data), Ramsey pricing and regulated monopoly pricing (elasticity–markup tradeoffs), and antitrust analysis of discriminatory pricing practices. In short: price discrimination is the bridge from static monopoly pricing (Profit Maximization) and welfare accounting (Consumer Surplus) to optimal mechanism design, regulatory economics, and modern data-driven pricing strategies.