Numerical Linear Algebra: Matrix Computations and Stability
Classified in Mathematics
Written on in
English with a size of 2.78 MB
Floating-Point Arithmetic and Stability
Eigendecomposition: Symmetric & Positive Definite (PD)
Exponent bias: b = 011111111111 = 1023
Grampian matrix:
Maximum exponent: maxp = 1023
Minimum exponent: minp = -1023 - 51
L2 Induced Matrix Norm:
Precision range: 2**-52 <= s < 2
Note on stability: A stable algorithm cannot overcome an ill-conditioned system; it can only guarantee precision when the system is not ill-conditioned.
Special Floating-Point Values:
np.Inf:0111111111110np.NaN:0111111111111np.Inf - np.Inf:NaNnp.Inf / np.Inf:NaNnp.Inf * 0:NaNnp.NaN + any:NaN
Backward Error: Back-calculating from the computed result to find the equivalent perturbation of the input.
Analytical inverse and the Woodbury matrix identity are both effective analytical options.
Direct and Iterative Solvers
Gaussian elimination (a direct method) is highly susceptible to roundoff errors.
LU Decomposition: Applies Gaussian elimination to [A | I | b]. It is unique when A is symmetric positive definite (PD). QR decomposition is rank-revealing.
Cholesky Decomposition:
Cholesky decomposition is superior to LU decomposition because both computation and storage requirements are halved. However, Cholesky can only be applied to symmetric positive definite (PD) matrices.
Covariance matrices are ideal for Cholesky decomposition, whereas using Cholesky for least squares is too tedious.
Precision step: y = x^p, ynext - y = x^(p-16)
Gauss-Seidel (Iterative Method): Complexity is O(n^2). It converges if:
- The spectral radius is less than 1 (the smaller, the better).
- Matrix
Ais strictly diagonally dominant and symmetric. ADis positive definite (PD).D - L - Uis positive definite (PD).
Computational Complexity and Optimization
Algorithm Complexity and Storage:
- Real-time:
O(n)algorithm complexity,O(1)storage (e.g., covariance matrix, linear regression). - Batch:
O(n)time,O(n)space (e.g., median). - Online: Variance computation.
- Recursion:
O(n log n) - Matrix-Vector Multiplication:
O(n^2) - Matrix-Matrix Multiplication:
O(n^3)
For an n * n matrix A that is invertible (nonsingular), solving the inverse directly is not simple; it leads to computational waste and numerical inaccuracy.
Orthogonality implies linear independence.
Optimization Methods
Coordinate Descent: A general iterative method for solving systems of equations by optimizing one variable x_j at a time, repeatedly cycling through all variables.
Gradient Descent vs. Coordinate Descent: Gradient descent chooses search directions well but struggles with step size. Coordinate descent does the opposite. Conjugate Gradient combines the advantages of both methods.
Singular Value Decomposition (SVD)
SVD formula:
Where U and V are semi-orthogonal matrices (UV semi-ortho).
The diagonal matrix D contains singular values ordered from largest to smallest (D diagonal singular values from largest to smallest).
Smoothing Matrices and Model Fitting
Smoothing matrices:
A smoothing matrix S is idempotent if A * yhat = 0.
The eigenvalues of smoothing matrices satisfy the interval (-1, 1], with at least one eigenvalue equal to 1.
Backfitting Additive Models: Very similar to the Gauss-Seidel method, but without strict convergence conditions.
Overfitting vs. Underfitting
Overfitting: Models are not consistent from data set to data set. That is, such models are much more likely to reflect the idiosyncrasies of a specific data set rather than the actual trends in the population of interest.
Underfitting: Models are not sufficiently rich to capture relevant variation in the data. However, underfit models are "reliable" in the sense that their fits tend to be consistent from one data set to the next.
The more flexible a model is, the higher the risk of overfitting, while the less flexible a model is, the higher the risk of underfitting.