2. Discrete Inverse Problems and Regularization

In this lecture we consider inverse problems that can be posed as a system of linear equations

Ku=f,

with KRm×n a given matrix and fRm the data. The goal is to find a solution uRn that (approximately) satisfies the equations.

2.1. Well-posedness

The first questions we address in detail are those of existence, uniqueness and stability of the solution. We discern the following cases:

  • If m=n and K has full rank, it is invertible and the solution is given by

u~=K1f.
  • If m>n and K has rank n, the system of equations may be inconsistent in which case a solution does not exist when f is not in the range of K.

  • If m<n and K has rank m, we can always find a solution but it may not be unique because K has a non-trivial null-space.

  • If K does not have maximal rank, the system of equations may be both inconsistent and K can have a null-space.

Stability in this context means that errors in the data do not get amplified too much. For now, we’ll only consider stability in case the solution exists and is unique. Consider the systems Ku~=f and Ku~δ=fδ for which we have

u~u~δ=K1(ffδ)K1ffδ,

where u denotes any vector norm on Rn and K denotes its corresponding (induced) matrix norm.

Since f=Ku~Ku~, we can express the relative error as

u~u~δu~KK1ffδf.

The quantity κ(K)=KK1 is called the condition number of K.

2.2. Pseudo inverse

Next, we discuss how we may define solutions to inconsistent or underdetermined systems of equations. For this, we’ll use the singular value decomposition of the matrix K.


If m>n, the system is inconsistent when f is not in the range of K. If K has full rank we can express it as

K=UnΣnVn,

with Un=(u1,u2,,un),Vn=(v1,v2,,vn) containing the first n left and right singular vectors and Σn is a diagonal matrix containing the singular values σ1σ2σn>0. We can then attempt to solve a modified system of equations

Ku=UnUnf,

where UnUnf projects f onto the range of K. We find

u~=VnΣn1UnfKf,

where K denotes the Moore-Penrose pseudo inverse of K.

This coincides with the least-squares solution

minuKuf22.

Indeed, by setting the gradient to zero, we find the normal equations

KKu=Kf,

so

u~=(KK)1Kf=Kf.

Example: An overdetermined system of equations

Consider solving

(112112)(u1u2)=(111).

The system of equations is obviously not consistent. The corresponding normal equations:

(6556)(u1u2)=(44)

do have a unique solution, u=(4/11,4/11)T.


If, on the other hand m<n and K has full rank, a solution exists but is not unique. In this case, we can look for the smallest solution (i.e., one that does not have any contributions in the null-space of K). This means we look for a solution that is spanned by the first m right singular vectors, denoted by Vm=(v1,v2,vm):

KVmz=f,

with u~=Vmz. We find that this solution is given by

u~=VmΣm1UmfKf,

where Um=(u1,u2,um), and Σm contains the non-zero singular values σ1σ2,,σm>0. Showing that this indeed yields the solution with the smallest norm is the subject of one of the assignments. The corresponding variational problem is

minuu22such thatKu=f.

Example: An underdetermined system of equations

Consider solving

(11)(u1u2)=(1).

The corresponding normal equations are given by 2v=1, and gives the solution u=v(1,1)T. In effect this means that we have added an equation to the system: u1u2=0.


If the matrix does not have full rank we may still employ the pseudo-inverse, which for a matrix of rank kmin{m,n} is defined as follows.

Definition: Moore-Penrose pseudo inverse

The pseudo-inverse of a matrix KRm,n with rank kmin{m,n} is defined in terms of the singular value decomposition as

K=VkΣk1Uk,

where Vk=(v1,v2,,vk), Uk=(u1,u2,,uk) and Σk contains the k largest singular values.


Note that the pseudo inverse allows us to define a unique solution, it is not necessarily stable as K2K2=σ1/σk may still be large. To study this issue in more detail, express the solution as

u~=VkΣk1Ukf=i=1kui,fσivi.

We note the component in f corresponding to vi is amplified by σi1. Thus if f has (noise) components that correlate with vi’s corresponding to very small singular values, these noise components get amplified. Generally, we do not expect problems if |ui,f| decays faster with i than the singular values σi. This is called the discrete Picard condition [Hansen, 1990].

Definition: Discrete Picard Condition

The vector fRm satisfies the discrete Picard condition for the problem Ku=f if the coefficients |ui,f| decay faster than the singular values σi of K, where ui denotes the left singular vectors of K.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
from myst_nb import glue

# define forward operator
def getK(n):
    x = np.linspace(0,1,n)
    K = np.diag(np.exp(-5*x))

    return K,x

# parameters
n = 100
sigma = 1e-2
K,x = getK(n)

# define ground truth and compute data
u = np.exp(-10*x)
f = K@u

# add noise
noise = np.random.randn(n)
f_delta = f + sigma*noise

# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)

# apply pseudo inverse
uhat = Vh.T@np.diag(1/s)@U.T@f_delta

# plot
fig, ax = plt.subplots(1,1)

ax.semilogy(s,'*',label=r'$\sigma_i$')
ax.semilogy(np.abs(U.T@f),'o',label=r'$|\langle u_i, f\rangle|$')
ax.semilogy(np.abs(U.T@f_delta),'o',label=r'$|\langle u_i, f^{\delta}\rangle|$')
ax.semilogy(np.abs(U.T@f) + np.sqrt(2/np.pi)*sigma,'k--')
ax.set_xlabel(r'$i$')
ax.set_ylim([1e-6,2])
ax.set_xlim([0,n])
ax.legend()

glue("picard", fig, display=False)
_images/discrete_ip_regularization_1_1.png

Example An ill-posed problem

As forward operator we take a diagonal matrix with entries

(K)ii=exp(5i/(n1))fori=0,2,,n1.

We generate noisy data fδ=Ku+e with ui=exp(10i/(n1)) and ei normally distributed with mean zero and variance σ2. We then find that

|ui,f|=|fi|=exp(15i/(n1)),

and

|ui,fδ|=|exp(15i/(n1))+ei||ui,f|+|ei|.

Since ei is normally distributed, its absolute value follows a folded normal distribution. This allows us to compute the expected upper bound

E(|ei|)=2πσ.

We conclude that |ui,fδ| does not decay exponentially as the singular values do and hence do not satisfy the discrete Picard condition.

_images/discrete_ip_regularization_1_0.png

Fig. 2.1 An example of the singular values and the coefficients |ui,f| and |ui,fδ| for n=100 and σ=102. We see that f does satisfy the discrete Picard condition while fδ does not.

2.3. Regularisation

To stabilize the problem we can modify the pseudo inverse in several ways to avoid dividing by small singular values. One option is to simply ignore small singular values and choose a cut-off value and define the solution as

(2.1)u~α=Vkrα(Σk)Ukf,

where rα is applied component-wise to the diagonal of the matrix Σk as

rα(σ)={σ1ifσα0otherwise

This gives rise to the Truncated Singular Value Decomposition (TSVD).

Definition: Truncated Singular Value Decomposition (TSVD)

The TSVD-regularized solution to the equation Ku=f is given by

u~α=i=1kαui,fσivi,

where {(ui,vi,σi)}i=1k denotes the singular system of K and kαk is chosen so that σiα for ikα.


Another option to avoid dividing by small singular values is to add small positive constant to shift the singular values away from zero. This gives rise to Tikhonov regularization.

Definition: Tikhonov regularisation

The Tikhonov-regularised solution to the equation Ku=f is given by

u~α=i=1kσiui,fσi2+αvi,

where {(ui,vi,σi)}i=1k is the singular system of K. This corresponds to setting rα(s)=s/(s2+α) in (2.1).

Unlike the TSVD, Tikhonov regularisation has a corresponding variational problem:

minuKuf22+αu22,

whose well-posedness can be easily established by writing down the corresponding normal equations

u~α=(KK+αI)1Kf=Vk(Σk2+αI)1ΣkUkf.

Indeed, it is easily verified that KK+αI has full rank whenever α>0.


Of course, one may think of other types of regularization by defining an appropriate function rα. Intuitively, we would like

  • rα(s)=s1 as α0

  • rα(0)< for any α>0

  • rα(s)0 as α

One example is Lavrentiev regularization, with:

rα(s)=(s+α)1.

Conditions on regularization schemes and methods for choosing α will be discussed in more detail in the chapter on Inverse Problems in Function Spaces.


In general, we can think of regularization for linear problems as defining a modified pseudo-inverse of K:

Kα=Vkrα(Σk)Uk.

Stability of regularized solutions defined by (2.1) then follows by considering the largest singular value of Kα, which can be made arbitrarily small by increasing α. On the other hand, increasing α will also introduce a bias in the solution. This trade-off is called the bias-variance trade-off:

Definition: Bias-variance trade-off

Here, we compare the regularised solution from noisy data u~α,δ=Kαfδ and the ideal solution u~=Kf:

u~u~α,δ(KKα)fbias+Kα(ffδ)variance.

If α0, the bias goes to zero, but possibly leads to a large variance. Large α on the other hand reduces the variance but leads to a large bias. Ideally, the regularisation parameter is chosen in such a way that the small singular values are stabilised and the large ones are hardly effected.

We can make a similar decomposition of the error between the ground truth u and the regularised solution

uu~α,δuKαf+Kα(ffδ).
import numpy as np
import matplotlib.pyplot as plt
from myst_nb import glue

# define forward operator
def getK(n):
    x = np.linspace(0,1,n)
    K = np.diag(np.exp(-5*x))

    return K,x

# parameters
n = 100
sigma = 1e-2
K,x = getK(n)

# define ground truth and compute data
u = np.exp(-10*x)
f = K@u

# add noise
noise = np.random.randn(n)
f_delta = f + sigma*noise

# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)

# error, bias and variance for TSVD
error = np.zeros(n)
bias = np.zeros(n)
variance = np.zeros(n)

for k in range(1,n):
  uk = Vh[:k,:].T@np.diag(1/s[:k])@U[:,:k].T@f
  uk_delta = Vh[:k,:].T@np.diag(1/s[:k])@U[:,:k].T@f_delta

  error[k] = np.linalg.norm(u - uk_delta)
  bias[k] = np.linalg.norm(u - uk)
  variance[k] = np.linalg.norm(uk - uk_delta)

# plot
k = np.linspace(0,n-1,n)

fig, ax = plt.subplots(1,1)

ax.plot(k,bias,label='bias')
ax.plot(k,variance,label='variance')
ax.plot(k,np.exp(5*k/(n-1))*sigma*np.sqrt(k+1) + np.exp(-10*(k+1)/(n-1))*np.sqrt(n-k-1),'k--')

ax.set_xlabel(r'$k$')
ax.set_ylabel(r'error')
ax.set_xlim([1,n])
ax.set_ylim([0,5])
ax.legend()

glue("bias_variance",fig,display=False)
_images/discrete_ip_regularization_3_1.png

Example An ill-posed problem, cont’d

Using the same forward operator and ground truth as the previous example and use TSVD-regularisation to regularise the problem. Using the truncation rank, k, as the regularisation parameter we find that

(Kkfδ)i={ui+exp(5i/(n1))eiik0i>k.

Thus, the error becomes

uKkfδ22=i=0ke10i/(n1)ei2+i=k+1n1e20i/(n1),

in which we recognise the bias and variance terms. In expectation, the upper bound for the error is then given by (k+1)e10k/(n1)σ+(nk1)e10(k+1)/(n1). While not very tight, it does suggest that there is an optimal k.

_images/discrete_ip_regularization_3_0.png

Fig. 2.2 An example of the error uKαfδ2 and corresponding bias and variance for n=100 and σ=102. As predicted, the bias decreases with k while the variance increases. The optimal k for this noise level lies at k30. Look at Fig. 2.1, do you see why this is the optimal k?

2.4. Exercises

2.4.1. Pseudo-inverse

Show that for a given matrix KRm×n:

  1. KK is an orthogonal projection on to the range of K

  2. IKK is an orthogonal projection to to the null-space of K

2.4.2. Least-squares and minimum-norm solutions:

Given a system of equations Ku=f with KRm×n:

  1. For m>n and rank(K) = n, show that the pseudo-inverse gives the solution u=Kf with the smallest residual Kuf2

  2. For m<n and rank(K) = m, show that the pseudo-inverse gives the solution u=Kf with the smallest norm u2

  3. For m>n and rank(K) = r<n, show that the pseudo-inverse gives the solution that minimizes both Kuf2 and u2.

2.4.3. Tikhonov regularization

Show that the solution to the variational problem

minuKuf22+αu22

is given in terms of the SVD of KRm×n as

u~=i=1kσiui,fσi2+αvi,

where k=rank(K).

2.4.4. Gravity surveying

We can estimate the density distribution in the subsurface by measuring the local gravitational pull. The density profile u(x) is related to such measurements by a linear operator

Ku(x)=01u(y)(1+(xy)2)3/2dy.

Upon discretization with stepsize h=1/n, the inverse problem can be cast as a system of n equations in n unknowns Ku=f.

You can use the code provided below to generate the matrix and noisy data for a given u(x)

  1. Plot the coefficients ui,f and the singular values σi to check the discrete Picard condition. What do you notice ?

  2. Solve the inverse problem for noisy data using the (regularized) pseudo-inverse; compute the optimal α by computing the bias and variance components of the error. Is this a practically feasible way to compute the optimal α?

  3. Compare the solutions for α<αopt, αopt and α>αopt to the ground truth. What do you notice?

def getK(n):
    h = 1/n;
    x = np.linspace(h/2,1-h/2,n)
    xx,yy = np.meshgrid(x,x)
    K = h/(1 + (xx - yy)**2)**(3/2)

    return K,x
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300

# define forward operator
n = 100
delta = 1e-2
K,x = getK(n)

# define ground truth and compute data
u = np.sin(np.pi*x) + 0.5*np.sin(2*np.pi*x)
f = K@u

# add noise
noise = np.random.randn(n)
f_delta = f + delta*noise

# plot ground truth and data
fig, axs = plt.subplots(1,2)

axs[0].plot(x,u)
axs[0].set_xlabel('x')
axs[0].set_ylabel('u(x)')

axs[1].plot(x,f,label='clean data')
axs[1].plot(x,f_delta,label='noisy data')

axs[1].set_xlabel('x')
axs[1].set_ylabel('f(x)')
axs[1].legend()

fig.set_figwidth(10)
fig.tight_layout()
plt.show()
_images/discrete_ip_regularization_8_0.png
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300

# define forward operator
def getK(n):
  h = 1/n;
  x = np.linspace(h/2,1-h/2,n)
  xx,yy = np.meshgrid(x,x)
  K = h/(1 + (xx - yy)**2)**(3/2)

  return K,x

# parameters
n = 100
sigma = 1e-2
K,x = getK(n)

# define ground truth and compute data
u = np.sin(np.pi*x) + 0.5*np.sin(2*np.pi*x)
f = K@u

# add noise
np.random.seed(2)
noise = np.random.randn(n)
f_delta = f + sigma*noise

# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)

# regularised pseudo inverse using Tikhonov
R_alpha = lambda alpha : Vh.T@np.diag(s/(s**2 + alpha))@U.T

# error, bias and variance for Tikhonov
na = 20
alpha = np.linspace(1e-6,2e-5,na)
error = np.zeros(na)
bias = np.zeros(na)
variance = np.zeros(na)

for k in range(na):
    error[k] = np.linalg.norm(u - R_alpha(alpha[k])@f_delta)
    bias[k] = np.linalg.norm(u - R_alpha(alpha[k])@f)
    variance[k] = np.linalg.norm(R_alpha(alpha[k])@(f-f_delta))

# plot
fig,ax = plt.subplots(1,3)

ax[0].semilogy(s,'*',label=r'$\sigma_i$')
ax[0].semilogy(np.abs(U.T@f),'o',label=r'$|\langle u_i, f\rangle|$')
ax[0].semilogy(np.abs(U.T@f_delta),'o',label=r'$|\langle u_i, f^\delta\rangle|$')
ax[0].set_xlabel(r'$i$')
ax[0].set_xlabel(r'magnitude')
ax[0].set_ylim([1e-16,2])
ax[0].set_xlim([0,n])
ax[0].set_aspect('auto')
ax[0].legend()

ax[1].plot(alpha,error,label='total error')
ax[1].plot(alpha,bias,label='bias')
ax[1].plot(alpha,variance,label='variance')
ax[1].set_xlabel(r'$\alpha \cdot 10^{5}$')
ax[1].set_ylabel(r'error')
ax[1].set_xticks([5e-6,1e-5,1.5e-5,2e-5])
ax[1].set_xticklabels(['.5','1.0','1.5','2.0'])
ax[1].set_aspect('auto')
ax[1].legend()

ax[2].plot(x,u,label='ground truth')
ax[2].plot(x,R_alpha(1e-6)@f_delta,label=r'$\alpha = 10^{-6}$')
ax[2].plot(x,R_alpha(1.5e-5)@f_delta,label=r'$\alpha = 1.5\cdot 10^{-5}$')
ax[2].plot(x,R_alpha(1e-4)@f_delta,label=r'$\alpha = 10^{-4}$')
ax[2].set_ylim([-2,2])
ax[2].set_xlabel(r'$x$')
ax[2].set_ylabel(r'$u(x)$')
ax[2].set_aspect('auto')
ax[2].legend()

fig.set_figwidth(10)
fig.tight_layout()
_images/discrete_ip_regularization_9_0.png

2.5. Assignments

2.5.1. Convolution

We are going to solve a deconvolution problem Ku=f, where K is a Toeplitz matrix with elements

kij=exp(a(ij)2)(n1)π/a,

and we are given noisy measurements

fδ=Ku+e,

where the entries of e are normally distributed with mean zero and variance δ2.

The goal of this assignment is to solve this inverse problem using a (truncated) SVD for two scenario’s

  1. u(x)=H(x0.3)H(x0.7),

  2. u(x)=x(1x),

where H denotes the Heaviside step function.

For each of the two scenarios answer the following questions:

  1. Is this problem ill-posed?

  2. Compute the (pseudo-)inverse of K using the SVD and compute the backward error uuδ2 for noise levels δ=0.001,0.01,0.1 for both scenario’s; what do you notice?

  3. Compute a regularized solution using a truncated SVD for noise levels δ=0.001,0.01,0.1 for both scenario’s. Manually choose the truncation parameter k in each case to get the best possible solution. What do you notice here?

  4. Explain your observations by investigating what the singular vectors look like (think about the discrete Picard condition as well).

import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
# parameters and grid
n = 100
a = 100
delta = 1e-1
x = np.linspace(0,1,n)

# define forward operator
c = np.exp(-a*x**2)/((n-1)*np.sqrt(np.pi/a))
K = la.toeplitz(c)

# ground truth and data
u = abs(x - 0.5)<.2
f = K@u + delta*np.random.randn(n)

# plot
plt.plot(x,u,label='u')
plt.plot(x,f,label='f')
plt.legend()
plt.show()
_images/discrete_ip_regularization_12_0.png
# parameters and grid
n = 100
a = 100
delta = 1e-1
x = np.linspace(0,1,n)

# define forward operator
c = np.exp(-a*x**2)/((n-1)*np.sqrt(np.pi/a))
K = la.toeplitz(c)

# ground truth and data
u = x*(1-x)
f = K@u + delta*np.random.randn(n)

# plot
plt.plot(x,u,label='u')
plt.plot(x,f,label='f')
plt.legend()
plt.show()
_images/discrete_ip_regularization_13_0.png