Constrained optimization and Lagrange multipliers

From Wikipedia, the free encyclopedia

This tutorial presents an introduction to optimization problems that involve finding a maximum or a minimum value of an objective function f(x_1,x_2,\ldots, x_n) subject to a constraint of the form g(x_1,x_2,\ldots, x_n)=k.

Contents

[edit] Maximum and minimum

Finding optimum values of the function  f(x_1,x_2,\ldots, x_n) without a constraint is a well known problem dealt with in calculus courses. One would normally use the gradient to find stationary points. Then check all stationary and boundary points to find optimum values.

[edit] Example

  • f(x,y) = 2x2 + y2
  • fx(x,y) = 4x = 0
  • fy(x,y) = 2y = 0

f(x,y) has one stationary point at (0,0).

[edit] The Hessian

A common method of determining weather or not a function has an extreme value at a stationary point is to evaluate the hessian[3] of the function at that point. where the hessian is defined as

H(f)= \begin{bmatrix}
\frac{{\partial}^2 f}{{\partial}^2 x_1} & \frac{{\partial}^2 f}{\partial x_1 \partial x_2} & \dots & \frac{{\partial}^2 f}{\partial x_1 \partial x_n} \\
\frac{{\partial}^2 f}{\partial x_2 \partial x_1} & \frac{{\partial}^2f}{{\partial}^2 x_2}& \dots & \frac{{\partial}^2f}{\partial x_n \partial x_n}\\
\vdots & \vdots & \ddots & \vdots \\
\frac{{\partial}^2f}{\partial x_n \partial x_1} & \frac{{\partial}^2f}{\partial x_n \partial x_2}& \dots & \frac{{\partial}^2f}{{\partial}^2 x_n}\\
\end{bmatrix}.

[edit] Second derivative test

The Second derivative test determines the optimality of stationary point x according to the following rules [2]:

  • If H(f) > 0 at point x then f has a local minimum at x
  • If H(f) < 0 at point x then f has a local maximum at } x
  • If H(f) has negative and positive eigenvalues then x is a saddle point
  • Otherwise the test is inconclusive

In the above example.

 H(f)=\begin{bmatrix}
4 & 0\\
0& 2
\end{bmatrix}.

Therefore f(x,y) has a minimum at (0,0).

[edit] Constrained maximum and minimum

When finding the extreme values of f(x_1,x_2,\cdots, x_n) subject to a constraint g(x_1,x_2,\cdots, x_n)=k, the stationary points found above will not work. This new problem can be thought of as finding extreme values of f(x_1,x_2,\dots, x_n) when the point (x_1,x_2,\dots, x_n) is restricted to lie on the surface  g(x_1,x_2,\dots, x_n)=k . The value of f(x_1,x_2,\dots, x_n) is maximized(minimized) when the surfaces touch each other,i.e , they have a common tangent line. This means that the surfaces gradient vectors at that point are parallel, hence,

\nabla f(x_1,x_2,\dots, x_n)= \lambda \nabla g(x_1,x_2,\dots, x_n).\,

The number λ in the equation is known as the Lagrange multiplier.

[edit] Lagrange multiplier method

The Lagrange multiplier methods solves the constrained optimization problem by transforming it into a non-constrained optimization problem of the form:

  • L(x_1,x_2,\ldots, x_n,\lambda)= f(x_1,x_2,\ldots, x_n)+\lambda (k-g(x_1,x_2,\ldots, x_n))

Then finding the gradient and hessian as was done above will determine any optimum values of L(x_1,x_2,\ldots, x_n,\lambda).

Suppose we now want to find optimum values for f(x,y) = 2x2 + y2 subject to x + y = 1 from [2].

Then the Lagrangian method will result in a non-constrained function.

  • L(x,y,λ) = 2x2 + y2 + λ(1 − xy)

The gradient for this new function is

  • \frac{\partial L}{\partial x}(x,y,\lambda)= 4x+\lambda (-1)=0
  • \frac{\partial L}{\partial y}(x,y,\lambda)= 2y+\lambda (-1)=0
  • \frac{\partial L}{\partial \lambda}(x,y,\lambda)=1-x-y=0

Finding the stationary points of the above equations can be obtained from their matrix from.

 \begin{bmatrix}
4 & 0 & -1 \\
0& 2 & -1 \\
1 & 1 & 0
\end{bmatrix} \begin{bmatrix}
x\\
y \\
\lambda \end{bmatrix}= \begin{bmatrix}
0\\
0\\
1
\end{bmatrix}

This results in x = 1 / 3,y = 2 / 3,λ = 4 / 3.

Next we can use the hessian as before to determine the type of this stationary point.

 H(L)=
 \begin{bmatrix}
4 & 0 & 0 \\
0& 2 & 0 \\
0&0&0
\end{bmatrix}

Since H(L) > 0 then the solution (1 / 3,2 / 3,4 / 3) minimizes f(x,y) = 2x2 + y2 subject to x + y = 1 with f(x,y) = 2 / 3.

[edit] References

[1] T.K. Moon and W.C. Stirling. Mathematical Methods and Algorithms for Signal Processing. Prentice Hall. 2000.
[2]http://mat.gsia.cmu.edu/QUANT/NOTES/chap4/node1.html
[3]http://www.ece.tamu.edu/~chmbrlnd/Courses/ECEN601/ECEN601-Chap3.pdf