Discrete Inverse Problems and Regularization
Contents
2. Discrete Inverse Problems and Regularization¶
In this lecture we consider inverse problems that can be posed as a system of linear equations
with
2.1. Well-posedness¶
The first questions we address in detail are those of existence, uniqueness and stability of the solution. We discern the following cases:
If
and has full rank, it is invertible and the solution is given by
If
and has rank , the system of equations may be inconsistent in which case a solution does not exist when is not in the range of .If
and has rank , we can always find a solution but it may not be unique because has a non-trivial null-space.If
does not have maximal rank, the system of equations may be both inconsistent and can have a null-space.
Stability in this context means that errors in the data do not get amplified too much. For now, we’ll only consider stability in case the solution exists and is unique. Consider the systems
where
Since
The quantity
2.2. Pseudo inverse¶
Next, we discuss how we may define solutions to inconsistent or underdetermined systems of equations. For this, we’ll use the singular value decomposition of the matrix
If
with
where
where
This coincides with the least-squares solution
Indeed, by setting the gradient to zero, we find the normal equations
so
Example: An overdetermined system of equations
Consider solving
The system of equations is obviously not consistent. The corresponding normal equations:
do have a unique solution,
If, on the other hand
with
where
Example: An underdetermined system of equations
Consider solving
The corresponding normal equations are given by
If the matrix does not have full rank we may still employ the pseudo-inverse, which for a matrix of rank
Definition: Moore-Penrose pseudo inverse
The pseudo-inverse of a matrix
where
Note that the pseudo inverse allows us to define a unique solution, it is not necessarily stable as
We note the component in
Definition: Discrete Picard Condition
The vector
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
from myst_nb import glue
# define forward operator
def getK(n):
x = np.linspace(0,1,n)
K = np.diag(np.exp(-5*x))
return K,x
# parameters
n = 100
sigma = 1e-2
K,x = getK(n)
# define ground truth and compute data
u = np.exp(-10*x)
f = K@u
# add noise
noise = np.random.randn(n)
f_delta = f + sigma*noise
# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)
# apply pseudo inverse
uhat = Vh.T@np.diag(1/s)@U.T@f_delta
# plot
fig, ax = plt.subplots(1,1)
ax.semilogy(s,'*',label=r'$\sigma_i$')
ax.semilogy(np.abs(U.T@f),'o',label=r'$|\langle u_i, f\rangle|$')
ax.semilogy(np.abs(U.T@f_delta),'o',label=r'$|\langle u_i, f^{\delta}\rangle|$')
ax.semilogy(np.abs(U.T@f) + np.sqrt(2/np.pi)*sigma,'k--')
ax.set_xlabel(r'$i$')
ax.set_ylim([1e-6,2])
ax.set_xlim([0,n])
ax.legend()
glue("picard", fig, display=False)

Example An ill-posed problem
As forward operator we take a diagonal matrix with entries
We generate noisy data
and
Since
We conclude that

Fig. 2.1 An example of the singular values and the coefficients
2.3. Regularisation¶
To stabilize the problem we can modify the pseudo inverse in several ways to avoid dividing by small singular values. One option is to simply ignore small singular values and choose a cut-off value and define the solution as
where
This gives rise to the Truncated Singular Value Decomposition (TSVD).
Definition: Truncated Singular Value Decomposition (TSVD)
The TSVD-regularized solution to the equation
where
Another option to avoid dividing by small singular values is to add small positive constant to shift the singular values away from zero. This gives rise to Tikhonov regularization.
Definition: Tikhonov regularisation
The Tikhonov-regularised solution to the equation
where
Unlike the TSVD, Tikhonov regularisation has a corresponding variational problem:
whose well-posedness can be easily established by writing down the corresponding normal equations
Indeed, it is easily verified that
Of course, one may think of other types of regularization by defining an appropriate function
as for any as
One example is Lavrentiev regularization, with:
Conditions on regularization schemes and methods for choosing
In general, we can think of regularization for linear problems as defining a modified pseudo-inverse of
Stability of regularized solutions defined by (2.1) then follows by considering the largest singular value of
Definition: Bias-variance trade-off
Here, we compare the regularised solution from noisy data
If
We can make a similar decomposition of the error between the ground truth
import numpy as np
import matplotlib.pyplot as plt
from myst_nb import glue
# define forward operator
def getK(n):
x = np.linspace(0,1,n)
K = np.diag(np.exp(-5*x))
return K,x
# parameters
n = 100
sigma = 1e-2
K,x = getK(n)
# define ground truth and compute data
u = np.exp(-10*x)
f = K@u
# add noise
noise = np.random.randn(n)
f_delta = f + sigma*noise
# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)
# error, bias and variance for TSVD
error = np.zeros(n)
bias = np.zeros(n)
variance = np.zeros(n)
for k in range(1,n):
uk = Vh[:k,:].T@np.diag(1/s[:k])@U[:,:k].T@f
uk_delta = Vh[:k,:].T@np.diag(1/s[:k])@U[:,:k].T@f_delta
error[k] = np.linalg.norm(u - uk_delta)
bias[k] = np.linalg.norm(u - uk)
variance[k] = np.linalg.norm(uk - uk_delta)
# plot
k = np.linspace(0,n-1,n)
fig, ax = plt.subplots(1,1)
ax.plot(k,bias,label='bias')
ax.plot(k,variance,label='variance')
ax.plot(k,np.exp(5*k/(n-1))*sigma*np.sqrt(k+1) + np.exp(-10*(k+1)/(n-1))*np.sqrt(n-k-1),'k--')
ax.set_xlabel(r'$k$')
ax.set_ylabel(r'error')
ax.set_xlim([1,n])
ax.set_ylim([0,5])
ax.legend()
glue("bias_variance",fig,display=False)

Example An ill-posed problem, cont’d
Using the same forward operator and ground truth as the previous example and use TSVD-regularisation to regularise the problem. Using the truncation rank,
Thus, the error becomes
in which we recognise the bias and variance terms. In expectation, the upper bound for the error is then given by
2.4. Exercises¶
2.4.1. Pseudo-inverse¶
Show that for a given matrix
is an orthogonal projection on to the range of is an orthogonal projection to to the null-space of
Answer
Express the pseudo-inverse in terms of the SVD as
and find . Because is an orthonormal basis for the range of we find that is an orthogonal project on to the range of .Following a similar argument we find that
where spans the orthogonal complement of the null-space of (recall that the null-space of is spanned by ).
2.4.2. Least-squares and minimum-norm solutions:¶
Given a system of equations
For
and rank( ) = , show that the pseudo-inverse gives the solution with the smallest residualFor
and rank( ) = , show that the pseudo-inverse gives the solution with the smallest normFor
and rank( ) = , show that the pseudo-inverse gives the solution that minimizes both and .
Answer
We can construct the solution with the smallest residual by formulating the corresponding normal equations
. Since rank( ) = , these have a unique solution. Expressing and in terms of the SVD of we find that and hence that pseudo-inverse yields the solution to the normal equations.Here, the system solutions of the form
with and . We find that . Expressing in terms of the SVD we see that the middle term is given by . Since lies in the null-space of we so . We conclude that is indeed the solution with the smallest norm.We can combine the results of the former to to show that in this case
is a solution to the normal equations and that it is the one with the smallest norm.
2.4.3. Tikhonov regularization¶
Show that the solution to the variational problem
is given in terms of the SVD of
where
Answer
First, write down the corresponding normal equations
2.4.4. Gravity surveying¶
We can estimate the density distribution in the subsurface by measuring the local gravitational pull. The density profile
Upon discretization with stepsize
You can use the code provided below to generate the matrix and noisy data for a given
Plot the coefficients
and the singular values to check the discrete Picard condition. What do you notice ?Solve the inverse problem for noisy data using the (regularized) pseudo-inverse; compute the optimal
by computing the bias and variance components of the error. Is this a practically feasible way to compute the optimal ?Compare the solutions for
, and to the ground truth. What do you notice?
def getK(n):
h = 1/n;
x = np.linspace(h/2,1-h/2,n)
xx,yy = np.meshgrid(x,x)
K = h/(1 + (xx - yy)**2)**(3/2)
return K,x
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
# define forward operator
n = 100
delta = 1e-2
K,x = getK(n)
# define ground truth and compute data
u = np.sin(np.pi*x) + 0.5*np.sin(2*np.pi*x)
f = K@u
# add noise
noise = np.random.randn(n)
f_delta = f + delta*noise
# plot ground truth and data
fig, axs = plt.subplots(1,2)
axs[0].plot(x,u)
axs[0].set_xlabel('x')
axs[0].set_ylabel('u(x)')
axs[1].plot(x,f,label='clean data')
axs[1].plot(x,f_delta,label='noisy data')
axs[1].set_xlabel('x')
axs[1].set_ylabel('f(x)')
axs[1].legend()
fig.set_figwidth(10)
fig.tight_layout()
plt.show()

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
# define forward operator
def getK(n):
h = 1/n;
x = np.linspace(h/2,1-h/2,n)
xx,yy = np.meshgrid(x,x)
K = h/(1 + (xx - yy)**2)**(3/2)
return K,x
# parameters
n = 100
sigma = 1e-2
K,x = getK(n)
# define ground truth and compute data
u = np.sin(np.pi*x) + 0.5*np.sin(2*np.pi*x)
f = K@u
# add noise
np.random.seed(2)
noise = np.random.randn(n)
f_delta = f + sigma*noise
# SVD
U, s, Vh = np.linalg.svd(K, full_matrices=True)
# regularised pseudo inverse using Tikhonov
R_alpha = lambda alpha : Vh.T@np.diag(s/(s**2 + alpha))@U.T
# error, bias and variance for Tikhonov
na = 20
alpha = np.linspace(1e-6,2e-5,na)
error = np.zeros(na)
bias = np.zeros(na)
variance = np.zeros(na)
for k in range(na):
error[k] = np.linalg.norm(u - R_alpha(alpha[k])@f_delta)
bias[k] = np.linalg.norm(u - R_alpha(alpha[k])@f)
variance[k] = np.linalg.norm(R_alpha(alpha[k])@(f-f_delta))
# plot
fig,ax = plt.subplots(1,3)
ax[0].semilogy(s,'*',label=r'$\sigma_i$')
ax[0].semilogy(np.abs(U.T@f),'o',label=r'$|\langle u_i, f\rangle|$')
ax[0].semilogy(np.abs(U.T@f_delta),'o',label=r'$|\langle u_i, f^\delta\rangle|$')
ax[0].set_xlabel(r'$i$')
ax[0].set_xlabel(r'magnitude')
ax[0].set_ylim([1e-16,2])
ax[0].set_xlim([0,n])
ax[0].set_aspect('auto')
ax[0].legend()
ax[1].plot(alpha,error,label='total error')
ax[1].plot(alpha,bias,label='bias')
ax[1].plot(alpha,variance,label='variance')
ax[1].set_xlabel(r'$\alpha \cdot 10^{5}$')
ax[1].set_ylabel(r'error')
ax[1].set_xticks([5e-6,1e-5,1.5e-5,2e-5])
ax[1].set_xticklabels(['.5','1.0','1.5','2.0'])
ax[1].set_aspect('auto')
ax[1].legend()
ax[2].plot(x,u,label='ground truth')
ax[2].plot(x,R_alpha(1e-6)@f_delta,label=r'$\alpha = 10^{-6}$')
ax[2].plot(x,R_alpha(1.5e-5)@f_delta,label=r'$\alpha = 1.5\cdot 10^{-5}$')
ax[2].plot(x,R_alpha(1e-4)@f_delta,label=r'$\alpha = 10^{-4}$')
ax[2].set_ylim([-2,2])
ax[2].set_xlabel(r'$x$')
ax[2].set_ylabel(r'$u(x)$')
ax[2].set_aspect('auto')
ax[2].legend()
fig.set_figwidth(10)
fig.tight_layout()

Answer
In the first figure we see that the Fourier coefficients of the noisy data definitely do not obey the discrete Picard condition. The Fourier coefficients of
may even be problematic as they are on the same level as the singular values for . This has to do with the numerical rank of , which is roughly 20.In the second figure we see the bias and variance parts of the error, the optimal
(where they are equal) is approximately . We cannot compute the optimal in this way in practice since we do not have access to the ground truth nor the noise-free data needed to compute the terms. You may also want to vary the random seed to explore the influence of the particular random instance of the noise. This further indicates how difficult it would be to choose an optimal value for based only on knowledge of the statistics of the noise.Comparing the solutions we note that for smaller
we get more oscillations in the solutions while for larger we get smoother solutions. Remarkably we get a really good reconstruction for . This does not always have to be the case as you can explore in the assignment below.
2.5. Assignments¶
2.5.1. Convolution¶
We are going to solve a deconvolution problem
and we are given noisy measurements
where the entries of
The goal of this assignment is to solve this inverse problem using a (truncated) SVD for two scenario’s
, ,
where
For each of the two scenarios answer the following questions:
Is this problem ill-posed?
Compute the (pseudo-)inverse of
using the SVD and compute the backward error for noise levels for both scenario’s; what do you notice?Compute a regularized solution using a truncated SVD for noise levels
for both scenario’s. Manually choose the truncation parameter in each case to get the best possible solution. What do you notice here?Explain your observations by investigating what the singular vectors look like (think about the discrete Picard condition as well).
import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
# parameters and grid
n = 100
a = 100
delta = 1e-1
x = np.linspace(0,1,n)
# define forward operator
c = np.exp(-a*x**2)/((n-1)*np.sqrt(np.pi/a))
K = la.toeplitz(c)
# ground truth and data
u = abs(x - 0.5)<.2
f = K@u + delta*np.random.randn(n)
# plot
plt.plot(x,u,label='u')
plt.plot(x,f,label='f')
plt.legend()
plt.show()

# parameters and grid
n = 100
a = 100
delta = 1e-1
x = np.linspace(0,1,n)
# define forward operator
c = np.exp(-a*x**2)/((n-1)*np.sqrt(np.pi/a))
K = la.toeplitz(c)
# ground truth and data
u = x*(1-x)
f = K@u + delta*np.random.randn(n)
# plot
plt.plot(x,u,label='u')
plt.plot(x,f,label='f')
plt.legend()
plt.show()
