Linear Algebra Essentials: Part 2

We conclude our chapter on the Adjoint of a linear operator.

Our main theorems

THEOREM:

Let V be an inner product space and T,U be linear operators on V. Then:

  • (T+U)^* = T^* + U^*
  • (cT)^* = \bar{c}T^*
  • (TU)^* = U^* T^*
  • (T^*)^* = T
  • I^* = I

The proof of this theorem is based on the definition of the adjoint: \langle T(x), y \rangle = \langle x, T^*(y) \rangle.

COROLLARY

The above properties hold for matrices too (through the L_A operator)

APPLICATION: Least Squares Approximation

The definition of the problem is classic: Given n points (t_1, y_1), (t_2, y_2), . . . , (t_n, y_n), we are trying to minimize \sum\limits_{i=1}^n (y_i-ct_i-d)^2, with c being the line of best-fit slope and d is its offset. If

A = \left [ \begin{array}{cc} t_1 & 1 \\ t_2 & 1 \\ . & .\\.&.\\.&.\\ \\ t_n & 1 \end{array} \right ], x = \left[\begin{array}{cc}c\\d\end{array}\right]   and y = \left[\begin{array}{cc}y_1\\y_2\\.\\.\\.\\y_n\end{array}\right]

then we are basically trying to minimize ||y-Ax||^2. We will study a general method of solving this minimization problem when A is an m\times n matrix. This allows for fitting of general polynomials of degree n-1 as well. We assume that m \geq n.

If we let x \in \mathbb{R}^n and y \in \mathbb{R}^m, then it is easy to see that \langle Ax, y \rangle = \langle x, A^*y \rangle from the properties of the adjoint studied thus far. We also know that rank(A^* A) = rank (A) by the dimension theorem, because the nullities of the two matrices are the same (this is straightforward to see). Thus, if A has rank n, then A^* A is invertible.

Using these observations, our norm minimization problem can be solved through the minimality of orthogonal projections: Since m \geq n, the range of L_A is a subspace of \mathbb{R}^n. The projection of y \in \mathbb{R}^m onto W is a vector u = Ax_0 and it is such that ||Ax_0 - y|| \leq ||Ax-y|| for all x \in \mathbb{R}^n. To find x_0, let us note that u-y lies in W^{\perp}, so \langle Ax, Ax_0 - y \rangle = 0 for all x \in \mathbb{R}^n. In other words, \langle x, A^*(Ax_0 - y) \rangle = 0 so x_0 solves the equation A^* A x_0 = A^* y. If additionally the rank of A is n, we get that:

x_0 = (A^*A)^{-1} A^* y

APPLICATION: Minimal solutions to Systems of Linear Equations

How do we find minimal norm solutions to systems of linear equations? We prove the following theorem:

Theorem

Let A \in M_{m \times n}(F), b \in F^m and suppose that the system Ax=b is consistent. Then:

  • There exists exactly one minimal solution s to Ax=b and s\in R(L_A^*).
  • The vector s is the only solution to Ax=b that lines in R(L_A^*). So, if u satisfies (A A^*)u = b, then s = A^* u.

In other words, finding minimal norm solutions to systems like Ax=b can be done by finding any solution to the system AA^* u = b and letting x = A^* u.

Proof

The theorem follows relatively easily from the minimality of the projection onto a subspace (like before). We do, however, need that N(L_A) = (R(L_A^*))^{\perp}, a fact which holds for all linear transformations. This fact can be easily established, so we leave it as an exercise.

Introduction to Spectral Graph Theory: Courant-Fischer Theorem

The following theorem gives a characterization of the eigenvalues of a symmetric matrix according to the Rayleigh quotient: the eigenvalues are the solution of a particular optimization problem

The Courant-Fischer Theorem

Let M be a symmetric matrix with eigenvalues \mu_1 \geq \mu_2 \geq \cdots \geq \mu_n. Then:

\mu_k = \max\limits_{\substack{S\subset \mathbb{R}^n \\ dim(S) = k}} \min\limits_{\substack{x\in S \\ x\neq 0}} \frac{x^T M x}{x^T x}.

Proof

  • Step 1: Let y_1, . . . , y_n be an orthonormal eigenbasis of M, where y_i belongs to the eigenspace of \mu_i. Then we can write x = \sum\limits_{i=1}^n c_i y_i, where c_i = \langle y_i, x \rangle.
  • Step 2: We have that: x^T M x = x^T M \left(\sum\limits_{i=1}^n c_i y_i\right) = x^T \left[\sum\limits_{i=1}^n c_i (M y_i) \right] = x^T \left (\sum\limits_{i=1}^n c_i \mu_i y_i\right) = \sum\limits_{i=1}^n c_i^2 \mu_i, as our basis is orthonormal.
  • Step 3: We prove that for some subspace S of dimension k we have that

\min\limits_{\substack{x\in S \\ x\neq 0}} \frac{x^T M x}{x^T x} \geq \mu_k.

If we let S = span(y_1, . . . , y_k), then x = \sum\limits_{i=1}^k c_i y_i and using our step 2, we get:

\frac{x^T M x}{x^T x} = \frac{\sum\limits_{i=1}^k c_i^2 \mu_i}{\sum\limits_{i=1}^k c_i^2} \geq \frac{\sum\limits_{i=1}^k c_i^2 \mu_k}{\sum\limits_{i=1}^k c_i^2} = \mu_k

  • Step 4: We prove that for all subspaces S of dimension k it is true that .

\min\limits_{\substack{x\in S \\ x\neq 0}} \frac{x^T M x}{x^T x} \leq \mu_k

We consider T = span (y_k, . . . , y_n) of dimension n-k+1. S\cap T has dimension at least 1. So:

\min\limits_{\substack{x\in S \\ x\neq 0}} \frac{x^T M x}{x^T x} \leq \min\limits_{\substack{x\in S\cap T\\ x\neq 0}} \frac{x^T M x}{x^T x} \leq \max\limits_{\substack{x\in T\\ x\neq 0}} \frac{x^T M x}{x^T x}

Writing x = \sum\limits_{i=1}^{k} c_i y_i for any x \in T, we get that:

\frac{x^T M x}{x^T x} = \frac{\sum\limits_{i=k}^n c_i^2 \mu_i}{\sum\limits_{i=k}^n c_i^2} \leq\frac{\sum\limits_{i=k}^n c_i^2 \mu_k}{\sum\limits_{i=k}^n c_i^2} = \mu_k

From steps 3 and 4, our proof concludes.

Linear Algebra Essentials: PART 1

1. An introduction

Theorem 1

Let V be an inner product space over F and let g\,:\,V\rightarrow F be a linear transformation. Then there exists a unique vector y \in V such that g(x) = \langle x,y \rangle for all x \in V.

Definition

Let T be a linear operator on an inner product space V. The unique linear operator T^* which is such that \langle T(x), y\rangle = \langle x, T^*(y)\rangle is called the adjoint of T.

Theorem 2

The adjoint of any linear operator exists.

Theorem 3

Let V be a finite-dimensional inner product space, and let \beta be an orthonormal basis for V. Then [T^*]_\beta = [T]_\beta^*.

Corollary: L_{A^*} = (L_A)^*

2. Proofs to the Theorems

PROOF OF THEOREM 1 (Every linear operator is an inner product operator)

Let \beta=\{u_1,u_2, . . . ,u_n\} be a base for V. Let A

y = \sum\limits_{i=1}^n \overline{g(u_i)}u_i.

Then let h(x) := \langle x,y\rangle. For i \in [n] we have: h(u_i) = \langle u_i, \sum\limits_{j=1}^n \overline{g(u_j)}u_j \rangle = \sum\limits_{j=1}^n g(u_j) \langle u_i, u_j \rangle = g(u_i). Since h and g agree on every element of \beta, they are equal linear operators. The uniqueness of y follows easily through a contradiction argument.

PROOF OF THEOREM 2 (existence and uniqueness of the adjoint linear operator)

Let y \in V. Define g(x) = \langle T(x), y \rangle. It is easy to see that g is linear. So, by Theorem 1, there exists a y^* \in V such that g(x) = \langle x, y^* \rangle. Defining T^*(y) = y^* gives us the adjoint. Linearity and uniqueness follow easily.

PROOF OF THEOREM 3 (Adjoint representative is representative of conjugate transpose)

If \beta=\{u_1,u_2, . . . ,u_n\}, then ([T^*]_\beta)_ij = \langle T^*(u_j), u_i \rangle = \overline{\langle T(u_i), u_j \rangle} = [T_\beta]^*.