Lecture Notes

Proximal operators

By Akshay Agrawal.

Definition

Let be a closed, convex, proper (CCP) function, (that is, its epigraph is a nonempty closed convex set), and let be a positive scalar. The proximal operator of the function is defined as

Because the function minimized above is strongly convex, it has a unique minimizer for every . The name “proximal” comes from the observation that is found by trading off minimizing and proximity to .

Properties

Separable sum

If , then .

Postcomposition with an affine function

If , with , then .

Precomposition with affine functions

If , where is an orthogonal matrix, then = .

Fixed points

Assume is CCP. The minimizers of are precisely the fixed points of . To see one direction, let be a minimizer of . Then

so the proximal point of with respect to is .

For the other direction, we’ll assume is subdifferentiable everywhere on its domain. Assume is a fixed point of . Note that means that . If , then , i.e., minimizes .

Fixed point iteration

The proximal operator of a CCP function is firmly nonexpansive; this means that repeatedly applying will yield a fixed point (and thus a minimizer) of .

Moreau decomposition

The Moreau decomposition connects proximal operators with convex duality. The decomposition is

Proof. (source: these slides)

The Moreau decomposition generalizes the notion of orthogonal complements of subspaces. If is a subspace and is its orthogonal complement, then ( is the orthogonal projection operator). This follows from the Moreau decomposition by noting that , , and .

In fact, the Moreau decomposition shows how convex cones play a role analogous to subspaces. If is a convex cone and its polar cone ():

by an argument similar to the one made for orthogonal complements of subspaces. In particular,

that is, .

Interpretations

As generalized projections

The proximal operator of the indicator of a closed convex set is the projection operator onto . In this sense, proximal operators can be viewed as generalized projections.

Relationship to the Moreau envelope

The Moreau envelope of a the function with parameter is defined as

The point attaining the infimum above is . The Moreau envelope is the infimal convolution of and , where infimal convolution of functions and is defined as

A useful fact is that . This means that . Because the conjugate of a CCP function is smooth when the function is strongly convex, can be viewed as a smoothed version of . Finally, using facts about conjuagte functions, it can be seen that

The provides a useful interpretation: iteration of the proximal operator is essentially gradient descent on a smoothed form of .

Resolvent

It is easy to show that

The right-hand side is called the resolvent of . It is in fact a function (i.e., it is single-valued), because the proximal operator furnishes the minimizer of a strongly convex function.

As generalizations of gradient and Levenberg-Marquardt updates

The proximal operator, evaluated at , for the first-order Taylor expansion of a function near a point is ; the operator for the second-order Taylor expansion is .

Algorithms

Proximal minimization

The proximal minimization algorithm is a fixed point iteration of a proximal operator:

The iteration converges to a fixed point because the proximal operator of a CCP function is firmly nonexpansive.

The proximal minimization algorithm can be interpreted as gradient descent on the Moreau envelope of . It can also be interpreted as disappearing quadratic regularization, in that as the iterates converge to a fixed point, the regularizer term goes to .

Gradient flow

Gradient descent and proximal minimization correspond to algorithms obtained by discretizing the gradient flow equation

A forward discretization yields the gradient descent update

while a backward discretization yields the proximal update

Proximal gradient method

The proximal gradient method is an algorithm for solving

where is differentiable and is proximable (and both are CCP). The algorithm is the iteration

When is the indicator of a convex set, this proximal gradient method reduces to the projected gradient method. The proximal gradient method converges if is Lipschitz continuous with constant , with rate , as long as the step sizes are less than .

The proximal gradient method can be interpreted as a fixed point iteration of the forward-backward operator

Forward-backward integration of gradient flow

As was the case for proximal minimization, the proximal gradient method can viewed as a method for solving gradient flows. The gradient flow system is

We discretize the flow as

note that we use a forward discretization for the gradient of and a backward discretization for the gradient of . Rearranging, we obtain

The resolvent is called a backward operator and the right-hand term is a forward operator.

Acceleration

The proximal gradient method can be accelerated by changing the update to

For suitably chosen (e.g., ) and , this method has an optimal (for first-order methods) rate: .

Alternating direction method of multipliers (ADMM)

ADMM is an algorithm for solving

where and are CCP. The algorithm is the iteration

Under mild assumptions, the sequences and converge to the same point. Hence, can be interpreted as a running sum of errors, and ADMM can be interpreted as an integral control method. ADMM can also understood as an algorithm for minimizing the augmented Lagrangian of the lifted problem

in which case can be interpreted as a sequence converging to the dual variable for the equality constraint (up to a constant scale factor).

As a fixed-point iteration

ADMM is a fixed-point iteration for finding a point satisfying

Fixed points satisfy

The last equation implies . Hence the fixed point conditions are

which in turn mean

and in particular

This implies that , i.e., is optimal.

References

  1. Proximal Algorithms, Parikh and Boyd
  2. EE263C Lecture Notes, Lieven Vandenberghe
  3. Math 301 Lecture Notes, Emmanuel Candes