Linear systems - iterative methods

  • Jacobi
  • Gauss-Seidel
  • SOR
  • Convergence

Iterative methods

  • Large systems ($n\ge 100$)
  • Sparse systems (mostly 0’s in $A$)
  • Diagonally dominant: $|a_{ii}|\ge \sum_{j\ne i}|a_{i,j}|.$
    • This is sufficient for convergence of the methods.
  • Start with an initial guess. Then correct that guess iteratively: $x_0\rightarrow x_1\rightarrow\ldots\rightarrow x_n\approx x$.

Jacobi method

$Ax=b$

$$\left[\begin{array}{cccc} a_{1,1} & a_{1,2} & a_{1,3} & a_{1,4} \\ a_{2,1} & a_{2,2} & a_{2,3} & a_{2,4} \\ a_{3,1} & a_{3,2} & a_{3,3} & a_{3,4} \\ a_{4,1} & a_{4,2} & a_{4,3} & a_{4,4} \end{array}\right] \left[\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{array}\right]= \left[\begin{array}{c} b_{1} \\ b_{2} \\ b_{3} \\ b_{4} \end{array}\right]$$

$$\rightarrow \sum_j a_{i,j}x_i = b_i,,,i=1\ldots n.$$

$$a_{i,1}x_1 + a_{i,2}x_2 + a_{i,3}x_3 + a_{i,4}x_4 = b_i,,, i=1,2,3,4.$$

Solve for $x_i$:

  • i = 1: $$x_1 = [b_1 - (a_{i,2}x_2 + a_{i,3}x_3 + a_{i,4}x_4)]/a_{i,1}.$$

  • i = 2: $$x_2 = [b_2 - (a_{i,1}x_1 + a_{i,3}x_3 + a_{i,4}x_4)]/a_{i,2}.$$

  • etc. $$\ldots$$

  • In general: $$ x_i = \frac{1}{a_{i,i}}\left(b_i - \sum_{j=1}^{i-1}a_{i,j}x_j - \sum_{j=i+1}^{n}a_{i,j}x_j\right)$$

  • Solution approach:

    • Get an initial guess for $x \rightarrow x^0$. (The $0$ is a superscript labeling the iteration number, not a power.)
    • Evaluate the right hand side (RHS) of the above equation.
    • This is then the new guess $x^1$.
    • Repeat.
  • Change the form of the above equation for convenience: $\pm x_i^k$ to the RHS. Or, $\pm a_{i,i}x_i^k/a_{i,i}$ to RHS.

    • The above equation needs two sums since the diagonal term is missing from the sum (it was solved for). The $\pm$ term above fills in the sum for convenience. $$ x_i^{k+1} = x_i^k + \frac{1}{a_{ii}}\left(b_i - \sum_{j=1}^na_{i,j}x_j^k\right).$$
  • Note, the term in parenthesis is the residual:

$$R_i^k = \left(b_i - \sum_{j=1}^na_{i,j}x_j^k\right).$$

and we can write $$x_i^{k+1} = x_i^k + \frac{1}{a_{i,i}}R_{i}^.k$$

  • $Ax=b$, so $b-Ax=0$, but $b-Ax^k\ne0$ (since we are not converged yet); rather $b-Ax^k = R^k$. At convergence $R^k\rightarrow 0$.

  • That is, iterations drive the residual (error) to zero.

  • This happens faster if we start with a better guess for $x$.

  • For the Jacobi method, we can write the method in matrix form as $$Dx^{k+1} = -(L+U)x^k + b,$$ where $A = L+D+U$ and $L$ is the lower triangular part of $A$, $D$ is the diagonal of $A$, and $U$ is the upper triangular part of $A$.

  • To code the Jacobi method, we store the old $x^k$ array (called xold, say), and fill in a separate $x^{k+1}$ array (called xnew, say). Then we set xold=xnew before the start of the next iteration.

    • We loop over the above blue equation to update $x$.

Gauss-Seidel method

  • This is similar to Jacobi, but we update $x_i$ as we go along.

  • On the RHS, use $x^{k+1}$ as it becomes available, instead of always using the old values. $$x_i^{k+1} = \frac{1}{a_{i,i}}\left(b_i - \sum_{j=1}^{i-1}a_{i,j}x_j^{k+1}-\sum_{j=i+1}^na_{i,j}x_j^k\right).$$

  • Notice, in the first summation, for $j<i$, we use $x_j^{k+1}$, which are newly updated. In the second summation, for $j>i$, we use $x_j^k$, which are old, since we have not updated these yet.

  • Rearrange this equation by $\pm a_{i,i}x_i^k$, to obtain: $$x_i^{k+1} = x_i^k + \frac{1}{a_{i,i}}\left(b_i - \sum_{j=1}^{i-1}a_{i,j}x_j^{k+1}-\sum_{j=i}^na_{i,j}x_j^k\right).$$

  • The term in parenthesis is the resisulal $$x_i^{k+1} = x_i^k + \frac{1}{a_{i,i}}R_i^k,$$ $$Ri^k = \left(b_i - \sum_{j=1}^{i-1}a_{i,j}x_j^{k+1}-\sum_{j=i}^na_{i,j}x_j^k\right).$$

  • Note, the update equation is the same as for Jacobi, but with a different residual calculation.

  • This method converges faster than Jacobi.

    • So why would we ever use Jacobi? We wouldn’t, we just use it as a step in motivating the Gauss-Seidel method.
  • This can be written in matrix form as $$(L+D)x^{k+1} = -Ux^k + b,$$ or $$Bx = c,$$ where $B=L+D$ and $c=-Ux^k+b$ for each iteration.

    • $L$, $D$, and $U$ are as for Jacobi above.

Successive over-relaxation (SOR)

Since solutions converge smoothly from our guess for $x$ to the actual $x$, we can take a bigger step towards the solution than a given Jacobi or G.S. step provides:

$$x_i^{k+1} = x_i^k + \frac{R_i^k}{a_{i,i}} \rightarrow x_i^{k+1} = x_i^k + \Delta x_i^k,$$

with $x_i^{k+1}-x_i^k = \Delta x_i^k = R_i^k/a_{i,i}.$

The SOR method multiplies the $\Delta x_i^k$ by a factor $\omega > 1$ to take a larger step towards $x$: $$x_i^{k+1} = x_i^k + \omega\frac{R_i^k}{a_{i,i}}.$$

  • Using the G.S. equation for $R_i^k$ above.

  • for $\omega < 1.0$ we have under-relaxation (take a smaller step). Under-relaxation is used in, i.e., nonlinear solvers to help stability.

  • For $\omega = 1.0$ we have Gauss-Seidel.

  • For $\omega \ge 2.0$ the method diverges (See Morton and Mayers for a proof of this.)

  • Note, SOR can also be written as $$x_i^{k+1} = (1-\omega)x_i^k + \omega(x_i^{k+1})_{G.S.}.$$

    • This is the Lever Rule
    • A linear interpolation/extrapolation between $(x_i^k)$ and $(x_i^{k+1})_{G.S.}.$
  • There is some optimal $\omega$ between $1$ and $2$.

    • In general $$\omega_{opt} = \frac{2}{1+\sqrt{1-\rho_{Jacobi}^2}},$$ where $\rho_{Jacobi}$ is the spectral radius (magnitude of largest eigenvalue) of the Jacobi iteration.
    • In some cases $\omega_{opt}$ can be found analytically.
    • See N.R. 3rd edition section 20.5

Convergence

  • $Ax=b$

    1. $Mx_{n+1} = Nx_n+b$ (iterative form, where $n$ is the iteration number.
    • at convergence, $x_{n+1} = x_n = x$, so
    1. $Mx = Nx + b \rightarrow (M-N)x=b \rightarrow A=M-N$.
  • (1)-(2) $\rightarrow M\epsilon_{n+1} = N\epsilon_n$,

    • where $\epsilon_n = x_n - x$ is the error vector on iteration $n$.
  • $\epsilon_{n+1} = (M^{-1}N)\epsilon_n \rightarrow \epsilon_{n+1} = Q\epsilon_n,$

    • where $Q = M^{-1}N$.
  • $\epsilon$ is a vector, and we can write it as $\epsilon = \sum_kc_k\psi_k.$

    • That is, write $\epsilon$ as a linear combination of $k$ eigenvectors $\psi_k$ of $Q$:
    • $Q\psi_k = \lambda_k\psi_k$.

Now,

  • $\epsilon_0 = \sum_kc_k\psi_k,$

  • $\epsilon_1 = Q\epsilon_0 = Q\sum_kc_k\psi_k = \sum_kc_kQ\psi_k = \sum_kc_k\lambda_k\psi_k,$

  • $\epsilon_2 = Q\epsilon_1 = Q\sum_kc_k\lambda_k\psi_k = \sum_kc_k\lambda_kQ\psi_k = \sum_kc_k\lambda_k^2\psi_k,$

  • $\ldots$

  • $\epsilon_n = \sum_kc_k\lambda_k^n\psi_k.$

  • So, as $n\rightarrow\infty,$ $\epsilon\rightarrow 0$ only if all the $\lambda_k<1$.

  • As $n\rightarrow$ big, the largest $|\lambda_k|$ dominates, so

    • $\epsilon_n\sim c_1\lambda_1^n\psi_1$,
    • $|\psi_1|=1$
    • Set a tolerance $\delta\rightarrow \epsilon_n\le\delta = c_1(\lambda_1)^n$.
    • $\log\delta = \log c_1 + n\log\lambda_1$
    • $$n = \frac{\log(\delta/c_1)}{\log\lambda_1}.$$
    • $\lambda_1$ is $\rho$ is the spectral radius.
    • As $\lambda_1\rightarrow 1$, $n\rightarrow\infty$.
  • So, in designing a method, choose $M$ and $N$ so that $Q=M^{-1}{N}$ has nice, small $\lambda_1$.

    • Make $M$ easy to invert: if $M=A$, $N=0$, and we can solve in one iteration, but its hard!

Convergence criteria

  • Absolute error
    • $|\cdot|{\infty}\rightarrow |\Delta x_i|{max}\le\epsilon$
    • $|\cdot|_1\rightarrow\sum_i|\Delta x_i|\le\epsilon$
    • $|\cdot|_2\rightarrow[\sum_i(\Delta x_i)^2]^{1/2}\le\epsilon.$
    • where $\epsilon$ is some user-specified error criteria.
  • Relative error
    • $|\Delta x_i/x_i|_{max}\le\epsilon,$
      • This is the maximum relative error in one of the variables.
    • $\frac{1}{n}\sum|\Delta x_i/x_i|\le\epsilon,$
      • This is the average relative error in the individual components.
    • $\left(\frac{1}{n}\sum(\Delta x_i/x_i)^2\right)^{1/2}\le\epsilon.$
      • This is root mean square error in the individual components.
  • Consider: $$\frac{\left(\sum\Delta x_i^2\right)^{1/2}}{\left(\sum x_i^2\right)^{1/2}} \le\epsilon.$$
    • This is length of the error vector relative to the length of the answer, and has a nice physical interpretation.
    • This is a relative error treating the system of unknowns as a whole, rather than considering each individual variable.
  • Normally, you test for convergence at each iteration. If you are converged (or if you hit some user-specified maximum number of steps) you quit the iteration loop and return.

Example Code

  • Write a solver for $Ax=b$ systems using the Jacobi method $$x_i^{k+1} = x_i^k + \frac{1}{a_{i,i}}R_{i}^k,$$ $$R_i^k = \left(b_i - \sum_{j=1}^na_{i,j}x_j^k\right) = b - A\cdot x.$$
import numpy as np

def jacobi(A,b,x, tol=1E-5, maxit=10000):
    ''' 
    Program solves Ax=b linear systems for x using the Jacobi iteration method
    A in a 2-D, square input matrix (n x n)
    b is a 1-D input array (length n)
    x is a 1-D input initial guess of the solution (length n)
    Function returns the solution x and the number of iterations required.
    '''
    
    for k in range(1,maxit+1):                            # iteration loop
        xold = x.copy()                                   # reset xold = xnew each time 
        R = b - np.dot(A,xold)                            # compute the Residual (error)
        x = xold + R/np.diag(A)                           # update the solution
        #RE = np.sqrt(np.sum(R**2))/np.sqrt(np.sum(x**2))  # relative error
        RE = np.linalg.norm(R)/np.linalg.norm(x)          # relative error
        if RE <= tol:                                     # check for convergence
            return x, k                                   # return solution and number of iterations
    
    print('Warning, reached', maxit, ' iterations without converging to tolerance =', tol)
    return x, maxit

Test the function

Take $A$ as a penta-diagonal matrix with the following form for $n=7$. We’ll solve the problem for $n=60$.

We’ll let $b$ be a vector of ones, and use a vector of zeros for the initial guess for $x$.

n = 60

A = np.diag(np.ones(n-3),-3)  + np.diag(-1*np.ones(n-1),-1) + \
    np.diag(4*np.ones(n),0) + np.diag(-1.*np.ones(n-1),1)   + \
    np.diag(np.ones(n-3),3)
b  = np.ones(n)
x0 = np.zeros(n)

x, k_iter = jacobi(A,b,x0)

print(f"The problem required {k_iter} iterations")

x_py = np.linalg.solve(A,b)
reAvg = np.mean(np.abs((x_py-x)/x_py))
print(f"\nThe average relative error between the Jacobi function and np.linalg.solve is {reAvg:.3e}")
print(f"\nx =\n{x}")
The problem required 31 iterations

The average relative error between the Jacobi function and np.linalg.solve is 4.824e-06

x =
[0.27184421 0.35220014 0.35220014 0.2648233  0.21524253 0.20822162
 0.23630615 0.25987558 0.26646384 0.25805109 0.24802253 0.24384705
 0.24597541 0.24993196 0.25214914 0.25182954 0.25037258 0.2493163
 0.24922614 0.24972154 0.25018695 0.25030629 0.25015446 0.24996385
 0.24988824 0.2499289  0.25000129 0.25003442 0.25001996 0.2499948
 0.2499948  0.25001996 0.25003442 0.25000129 0.2499289  0.24988824
 0.24996385 0.25015446 0.25030629 0.25018695 0.24972154 0.24922614
 0.2493163  0.25037258 0.25182954 0.25214914 0.24993196 0.24597541
 0.24384705 0.24802253 0.25805109 0.26646384 0.25987558 0.23630615
 0.20822162 0.21524253 0.2648233  0.35220014 0.35220014 0.27184421]