Numerical Linear Algebra: Matrix Computations and Stability

Classified in Mathematics

Written on in English with a size of 2.78 MB

Floating-Point Arithmetic and Stability

wCgqVXThjkjFQAAAABJRU5ErkJggg==

Eigendecomposition: Symmetric & Positive Definite (PD)

Exponent bias: b = 011111111111 = 1023

Grampian matrix: 8Coj0fVISuIFQAAAAASUVORK5CYII=

Maximum exponent: maxp = 1023 v0fN56DRlPBFnMAAAAASUVORK5CYII=

Minimum exponent: minp = -1023 - 51

L2 Induced Matrix Norm: tGTqpQ1MsAAAAASUVORK5CYII=

8AA5DM53COpDsAAAAASUVORK5CYII= DzmtbcOUm1CWAAAAAElFTkSuQmCC

Precision range: 2**-52 <= s < 2

Note on stability: A stable algorithm cannot overcome an ill-conditioned system; it can only guarantee precision when the system is not ill-conditioned.

Special Floating-Point Values:

  • np.Inf: 0111111111110
  • np.NaN: 0111111111111
  • np.Inf - np.Inf: NaN
  • np.Inf / np.Inf: NaN
  • np.Inf * 0: NaN
  • np.NaN + any: NaN

Backward Error: Back-calculating from the computed result to find the equivalent perturbation of the input.

Analytical inverse and the Woodbury matrix identity are both effective analytical options.

Direct and Iterative Solvers

Gaussian elimination (a direct method) is highly susceptible to roundoff errors.

LU Decomposition: Applies Gaussian elimination to [A | I | b]. It is unique when A is symmetric positive definite (PD). QR decomposition is rank-revealing.

Cholesky Decomposition: 1qY6f8Q9Z07HvuLi66K6oOw9xqPo0EHNX81frb82A3NFRBzXXs+oHNUFZRXjUXTowN0ms7zCuFAPVA56QVnFeBQdOnC3ySyvMC7UA5WDXlBWMR5Fhw7cbTLLK4wL9UDloBeUVYxH0aEDd5vM8grjQj1QOegFZRXjUXTowN0ms7zCuFAPVA56QVnFeBQdOnC3yZyv8D8RAPbzywvq2gAAAABJRU5ErkJggg==

Cholesky decomposition is superior to LU decomposition because both computation and storage requirements are halved. However, Cholesky can only be applied to symmetric positive definite (PD) matrices.

AyMEBpD9H3H7AAAAAElFTkSuQmCC

Covariance matrices are ideal for Cholesky decomposition, whereas using Cholesky for least squares is too tedious.

Precision step: y = x^p, ynext - y = x^(p-16)

Gauss-Seidel (Iterative Method): Complexity is O(n^2). It converges if:

  • The spectral radius is less than 1 (the smaller, the better).
  • Matrix A is strictly diagonally dominant and symmetric.
  • AD is positive definite (PD).
  • D - L - U is positive definite (PD).

3ytkdrB56SixEpVgx0VZoMl6UHgqOIvcajaP1cGriN0+v+z5dJYnrDLgAAAAABJRU5ErkJggg==

Computational Complexity and Optimization

Algorithm Complexity and Storage:

  • Real-time: O(n) algorithm complexity, O(1) storage (e.g., covariance matrix, linear regression).
  • Batch: O(n) time, O(n) space (e.g., median).
  • Online: Variance computation.
  • Recursion: O(n log n)
  • Matrix-Vector Multiplication: O(n^2)
  • Matrix-Matrix Multiplication: O(n^3)

For an n * n matrix A that is invertible (nonsingular), solving the inverse directly is not simple; it leads to computational waste and numerical inaccuracy.

Orthogonality implies linear independence.

Optimization Methods

Coordinate Descent: A general iterative method for solving systems of equations by optimizing one variable x_j at a time, repeatedly cycling through all variables.

Gradient Descent vs. Coordinate Descent: Gradient descent chooses search directions well but struggles with step size. Coordinate descent does the opposite. Conjugate Gradient combines the advantages of both methods.

Singular Value Decomposition (SVD)

SVD formula: wH4+080thBm6wAAAABJRU5ErkJggg==

Where U and V are semi-orthogonal matrices (UV semi-ortho).

The diagonal matrix D contains singular values ordered from largest to smallest (D diagonal singular values from largest to smallest).

n+jqehGgVuxGQAAAABJRU5ErkJggg==

Smoothing Matrices and Model Fitting

Smoothing matrices: 3pzJTShQ5wQAAAAAElFTkSuQmCC

A smoothing matrix S is idempotent if A * yhat = 0.

The eigenvalues of smoothing matrices satisfy the interval (-1, 1], with at least one eigenvalue equal to 1.

Backfitting Additive Models: Very similar to the Gauss-Seidel method, but without strict convergence conditions.

Overfitting vs. Underfitting

Overfitting: Models are not consistent from data set to data set. That is, such models are much more likely to reflect the idiosyncrasies of a specific data set rather than the actual trends in the population of interest.

Underfitting: Models are not sufficiently rich to capture relevant variation in the data. However, underfit models are "reliable" in the sense that their fits tend to be consistent from one data set to the next.

d23zykPEkdwAAAABJRU5ErkJggg==

The more flexible a model is, the higher the risk of overfitting, while the less flexible a model is, the higher the risk of underfitting.

hgIXkiSPYMAAoXsy1p9auP+OCpzV+R9CdsI2vSDsfgAAAABJRU5ErkJggg==

8L5Z5xaC2XX74AAAAASUVORK5CYII=

Related entries: