Quadratic Forms, Matrix Norms and Optimization

Andrew Thangaraj

Aug-Nov 2020

Recap

  • Vector space \(V\) over a scalar field \(F= \mathbb{R}\) or \(\mathbb{C}\)
  • \(m\times n\) matrix A represents a linear map \(T:F^n\to F^m\)
    • dim null \(T+\) dim range \(T=\) dim \(V\)
    • Solution to \(Ax=b\) (if it exists): \(u+\) null\((A)\)
  • Four fundamental subspaces of a matrix
    • Column space, row space, null space, left null space
  • Eigenvalue \(\lambda\) and Eigenvector \(v\): \(Tv=\lambda v\)
    • There is a basis w.r.t. which a linear map is upper-triangular
    • If there is a basis of eigenvectors, linear map is diagonal w.r.t. it
  • Inner products, norms, orthogonality and orthonormal basis
    • There is an orthonormal basis w.r.t. which a linear map is upper-triangular
    • Orthogonal projection: distance from a subspace
  • Adjoint of a linear map: \(\langle Tv,w\rangle=\langle v,T^*w\rangle\)
    • null \(T=\) \((\)range \(T^*)^{\perp}\)
  • Self-adjoint: \(T=T^*\), Normal: \(TT^*=T^*T\)
    • Eigenvectors corresponding to different eigenvalues are orthogonal
  • Complex/real spectral theorem: \(T\) is normal/self-adjoint \(\leftrightarrow\) orthonormal basis of eigenvectors

Simplifying quadratic forms

Consider real vector spaces for this lecture (\(F=\mathbb{R}\))

\(A\): \(n\times n\) real symmetric matrix

Quadratic form: \(x^TAx\) for \(x\in\mathbb{R}^n\)

Example: \(A=\begin{bmatrix}1&4\\4&1\end{bmatrix}\)

\(x^TAx=x_1^2+8x_1x_2+x_2^2\)

Can \(x_1x_2\) term be eliminated by changing variables?

Example: Let \(x_1=s_1+s_2\), \(x_2=s_1-s_2\)

\(x^TAx=10s_1^2-6s_2^2\)

How does the above generalise?

Using spectral theorem to simplify quadratic forms

\(A\): \(n\times n\) real symmetric matrix

\(\{e_1,\ldots,e_n\}\): orthonormal eigenvector basis

\(Ae_i=\lambda_ie_i\)

\(S=\begin{bmatrix} \vdots&\cdots&\vdots\\ e_1&\cdots&e_n\\ \vdots&\cdots&\vdots \end{bmatrix}\), \(S^{-1}=S^T=\begin{bmatrix} \cdots&e^T_1&\cdots\\ \vdots&\vdots&\vdots\\ \cdots&e^T_n&\cdots\\ \end{bmatrix}\)

\(A=SDS^T\), where \(D=\begin{bmatrix} \lambda_1&&0\\ &\ddots&\\ 0&&\lambda_n \end{bmatrix}\)

\(x^TAx = x^TSDS^Tx = s^TDs\), where \(s=S^Tx\)

\(x^TAx = \lambda_1s_1^2+\cdots+\lambda_ns_n^2\)

Constrained maximisation/minimisation of quadratic forms

\(\max\limits_{\lVert x\rVert=1}x^TAx,\quad \min\limits_{\lVert x\rVert=1}x^TAx\)

  • Without any constraint, max is \(0\)/\(\infty\) and min is \(-\infty\)/\(0\)

  • \(\max\dfrac{x^TAx}{x^Tx}=\max\limits_{\lVert x\rVert=1}x^TAx\)

Example

\(\max\limits_{x^2+y^2=1}4xy\)

  • Rectangle of largest area inscribed within a circle

  • Various methods to solve it

  • How to generalise?

Spectral theorem and maximisation/minimisation

\(A\): \(n\times n\) real symmetric matrix

\(\{e_1,\ldots,e_n\}\): orthonormal eigenvector basis

\(A=\lambda_1e_1e^T_1+\cdots+e_ne^T_n\)

\(\lambda_1\ge \cdots \ge \lambda_n\)

Let \(x=x_1e_1+\cdots+x_ne_n\), \(\lVert x\rVert^2=x^2_1+\cdots+x^2_n\)

\(x^TAx=\lambda_1x^2_1+\cdots+\lambda_nx^2_n\)

Constrained maximum

\(\max\limits_{\lVert x\rVert=1}x^TAx=\lambda_1\) at \(x=e_1\)

Constrained minimum

\(\min\limits_{\lVert x\rVert=1}x^TAx=\lambda_n\) at \(x=e_n\)

Orthogonality constraints

\(A\): \(n\times n\) real symmetric matrix

\(\{e_1,\ldots,e_n\}\): orthonormal eigenvector basis

\(A=\lambda_1e_1e^T_1+\cdots+e_ne^T_n\)

\(\lambda_1\ge \cdots \ge \lambda_n\)

Additional constraint: \(\langle x,e_1\rangle=0\)

\(\max\limits_{\lVert x\rVert=1,\langle x,e_1\rangle=0}x^TAx\)

Let \(x=x_2e_2+\cdots+x_ne_n\), \(\lVert x\rVert^2=x^2_2+\cdots+x^2_n\)

\(x^TAx=\lambda_2x^2_2+\cdots+\lambda_nx^2_n\)

Constrained maximum

\(\max\limits_{\lVert x\rVert=1,\langle x,e_1\rangle=0}x^TAx=\lambda_2\) at \(x=e_2\)

Matrix norm

\(A\): \(m\times n\) matrix

Norm of \(A\), denoted \(\lVert A\rVert\), is defined as \(\lVert A\rVert=\max\limits_{\lVert x\rVert=1}\lVert Ax\rVert\)

  • \(\lVert A\rVert=\max\limits_{x}\dfrac{\lVert Ax\rVert}{\lVert x\rVert}\)

  • Norm: maximum increase in norm from \(x\) to \(Ax\)

  • Also interesting: \(\min\limits_{x}\dfrac{\lVert Ax\rVert}{\lVert x\rVert}\)

Matrix norm and quadratic forms

\(\lVert A\rVert^2=\max\limits_{\lVert x\rVert=1}\lVert Ax\rVert^2\)

\(\lVert Ax\rVert^2=\langle Ax,Ax\rangle=x^TA^TAx\)

Quadratic form defined by \(A^TA\)

\(A^TA\): symmetric, positive

Largest eigenvalue: \(\lambda_{\max}(A^TA)\ge0\)

\(\lVert A\rVert = \sqrt{\lambda_{\max}(A^TA)}\)

Optimising smooth multivariate functions

Consider \(f(x_1,\ldots,x_n)\), \(x_i\in\mathbb{R}\)

\(f\): continuous single and double partial derivatives

Critical points

\((x_1,\ldots,x_n)\) s.t. \(\dfrac{\partial f}{\partial x_i}=0\) for all \(i\)

  • At critical points, the function might be a local maximum, local minimum or a saddle point

Hessian matrix

\(H=\begin{bmatrix} \dfrac{\partial f}{\partial x_1^2}&\cdots&\dfrac{\partial f}{\partial x_1\partial x_n}\\ \vdots&\ddots&\vdots\\ \dfrac{\partial f}{\partial x_n\partial x_1}&\cdots&\dfrac{\partial f}{\partial x_n^2} \end{bmatrix}\) (symmetric)

  • Local maximum at a critical point if Hessian is negative definite

  • Local minimum at a critical point if Hessian is positive definite

  • Saddle point if Hessian is neither positive nor negative definite