Minimum Mean Squared Error Estimation

Andrew Thangaraj

Aug-Nov 2020


  • Vector space \(V\) over a scalar field \(F\)
    • \(F\): real field \(\mathbb{R}\) or complex field \(\mathbb{C}\) in this course
  • \(m\times n\) matrix A represents a linear map \(T:F^n\to F^m\)
    • dim null \(T\) + dim range \(T\) = dim \(V\)
    • Solution to \(Ax=b\) (if it exists): \(u+\) null\((A)\)
  • Four fundamental subspaces of a matrix
    • Column space, row space, null space, left null space
  • Eigenvalue \(\lambda\) and Eigenvector \(v\): \(Tv=\lambda v\)
    • Some linear maps are diagonalizable
  • Inner products, norms, orthogonality and orthonormal basis
    • Upper triangular matrix for a linear map over an orthonormal basis
    • Orthogonal projection gives closest vector in the subspace
    • Least squares solution to a linear equation is orthogonal projection

Inner product space of bivariate functions

\(V\): \(m\times n\) matrix or table with real entries

Vector: \(f=\begin{matrix} f_{11}&f_{12}&\cdots&f_{1n}\\ f_{21}&f_{22}&\cdots&f_{2n}\\ \vdots&\vdots&\vdots&\vdots\\ f_{m1}&f_{m2}&\cdots&f_{mn} \end{matrix}\)

Think of \(f\) as a function of two variables tabulated as above

Weighted inner product

Weighting function: \(\begin{matrix} p_{11}&p_{12}&\cdots&p_{1n}\\ p_{21}&p_{22}&\cdots&p_{2n}\\ \vdots&\vdots&\vdots&\vdots\\ p_{m1}&p_{m2}&\cdots&p_{mn} \end{matrix}\)


\[\langle f,g\rangle=\sum_{i,j}p_{ij}f_{ij}g_{ij}\]

Subspace of univariate functions

\(U\): subspace of tables with all columns identical

\(\begin{matrix} g_{11}&g_{11}&\cdots&g_{11}\\ g_{21}&g_{21}&\cdots&g_{21}\\ \vdots&\vdots&\vdots&\vdots\\ g_{m1}&g_{m1}&\cdots&g_{m1} \end{matrix}\)

Subspace \(U\) has functions of only one variable

Orthonormal basis for \(U\)

\[\frac{1}{\sqrt{\sum_j p_{1j}}}\begin{bmatrix} 1&1&\cdots&1\\ 0&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0 \end{bmatrix}, \frac{1}{\sqrt{\sum_j p_{2j}}}\begin{bmatrix} 0&0&\cdots&0\\ 1&1&\cdots&1\\ \vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0 \end{bmatrix},\ldots, \frac{1}{\sqrt{\sum_j p_{mj}}}\begin{bmatrix} 0&0&\cdots&0\\ 0&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots\\ 1&1&\cdots&1 \end{bmatrix}\]

What is \(P_U\)?

Projection onto univariate functions

Vector: \(f=\begin{matrix} f_{11}&f_{12}&\cdots&f_{1n}\\ f_{21}&f_{22}&\cdots&f_{2n}\\ \vdots&\vdots&\vdots&\vdots\\ f_{m1}&f_{m2}&\cdots&f_{mn} \end{matrix}\)

\[P_Uf = \frac{\sum_j p_{1j}f_{1j}}{\sum_j p_{1j}}\begin{bmatrix} 1&1&\cdots&1\\ 0&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0 \end{bmatrix}+\frac{\sum_j p_{2j}f_{2j}}{\sum_j p_{2j}}\begin{bmatrix} 0&0&\cdots&0\\ 1&1&\cdots&1\\ \vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0 \end{bmatrix}+\cdots+ \frac{\sum_j p_{mj}f_{mj}}{\sum_j p_{mj}}\begin{bmatrix} 0&0&\cdots&0\\ 0&0&\cdots&0\\ \vdots&\vdots&\vdots&\vdots\\ 1&1&\cdots&1 \end{bmatrix}\]

\[P_Uf=\begin{bmatrix} \sum_j p_{j|1}f_{1j}&\sum_j p_{j|1}f_{1j}&\cdots&\sum_j p_{j|1}f_{1j}\\ \sum_j p_{j|2}f_{2j}&\sum_j p_{j|2}f_{2j}&\cdots&\sum_j p_{j|2}f_{2j}\\ \vdots&\vdots&\vdots&\vdots\\ \sum_j p_{j|m}f_{mj}&\sum_j p_{j|m}f_{mj}&\cdots&\sum_j p_{j|m}f_{mj} \end{bmatrix}\]

where \(p_{j|i}=\dfrac{p_{ij}}{\sum_j p_{ij}}\).

Estimation problem

Suppose random variables \(X\), \(Y\) take values in \(\mathcal{X}\), \(\mathcal{Y}\).

Let \(X\) and \(Y\) have a joint distribution. Assume \(\mathcal{X},\mathcal{Y}\) are discrete, finite, real.

Let \(p_{xy}=\) Pr\((X=x,Y=y)\) for \(x\in\mathcal{X},y\in\mathcal{Y}\).

\(0\le p_{xy}\le 1\)

Estimation problem

Given that X takes value \(x\), how to estimate a value for \(Y\)?

Random variables and vector spaces

\(f(X,Y)\): real-valued function of \(X\) and \(Y\)

Bivariate function with \(m=\lvert\mathcal{X}\rvert\) and \(n=\lvert\mathcal{Y}\rvert\)

Vector: \(f=\begin{matrix} f_{11}&f_{12}&\cdots&f_{1n}\\ f_{21}&f_{22}&\cdots&f_{2n}\\ \vdots&\vdots&\vdots&\vdots\\ f_{m1}&f_{m2}&\cdots&f_{mn} \end{matrix}\)

\(U=\{g(X)\}\), set of all real-valued functions of \(X\)

\(\begin{matrix} g_{1}&g_{1}&\cdots&g_{1}\\ g_{2}&g_{2}&\cdots&g_{2}\\ \vdots&\vdots&\vdots&\vdots\\ g_{m}&g_{m}&\cdots&g_{m} \end{matrix}\)

Same as the situation considered earlier!

Mean Square Error (MSE): Measure of an estimator

Given \(X=x\), set the estimate \(\widehat{Y}\) to be some function of \(X\)


Mean square error (MSE)

\[\lVert Y-g(X)\rVert^2=\sum_{x,y\in\mathcal{X}\times\mathcal{Y}}p_{xy}(y-g(x))^2\]

Same as the norm from the weighted inner product!

Good estimators have low MSE or low weighted norm.

Minimum MSE estimation

Minimum Mean Squared Error (MMSE) estimation

\(Y\): vector in \(V\), let \(\mathcal{Y}=\{y_1,\ldots,y_n\}\)

\(\begin{matrix} y_{1}&y_{2}&\cdots&y_{n}\\ y_{1}&y_{2}&\cdots&y_{n}\\ \vdots&\vdots&\vdots&\vdots\\ y_{1}&y_{2}&\cdots&y_{n}\\ \end{matrix}\)

\(g(X)\): closest from subspace \(U\) minimizes \(\lVert Y-g(X)\rVert^2\)

MMSE estimator is projection of \(Y\) onto \(U\)

MMSE: \(g(x)=\sum_j p_{y_j|x}y_{j}\)

where \(p_{y_j|x}=\dfrac{p_{x,y_j}}{\sum_j p_{x,y_j}}\)


  1. Measuring some physical quantity

    Why do you take multiple measurements and take their mean?

    Each measurement is an observation of the physical quantity plus random noise.

    Given many noisy measurements, estimate the actual value of the physical quantity.

    Taking the mean is a form of estimation.

  2. Telecommunications

    A bit is transmitted and a noisy version is received.

    Receivers estimate the transmitted bit.