Measurements¶
import matplotlib.pyplot as plt
import numpy as np
import ramanspy as rp
import pint
import scipy
u = pint.UnitRegistry()
Composition of a medium¶
One often needs to determine the concentration of cells and medium components with a volume. Depending on the bioloical or chemical species of interest there is a variety of measurement techniques available.
We will discuss two of them within this lecture:
Raman spectroscopy
Optical density and the related techniques of spectrophotometry and flow cytometry
A third technique will be discussed in the next lecture:
- Microscopy
Raman spectroscopy¶
Quick overview on how it works¶
For a schematic setup of the process see this wikimedia schematic_(2019)_1662-1683.png).
- Photons are projected onto a material.
- Most photons are elastically scattered by the material without (or with almost no, as in an elastic collision with a huge mass) energy change. Only the direction changes due to the collision.
- Some photons, however, are observed to have a different energy level and thus a different frequency. This is due to vibrational changes within the material, resp., its chemical bonds, induced by the collision.
- The second type of photons are collected and their counts and wavenumber ($\text{cm}^{-1}$) are reported. In fact the difference in the wavenumber of the received photon and the emmited photon is reported; resulting in a Raman spectrum. This difference is called the Raman shift ($\text{cm}^{-1}$).
The wavenumber $\tilde\nu$ [$\text{m}^{-1}$] is related to the wavelength $\lambda$ [$\text{m}$] via
$$\tilde\nu = \frac{1}{\lambda}$$
From the relation between frequnecy $\nu$ [$\text{s}^{-1}$] and the speed of a photon $c$ [$ms^{-1}$], we have
$$\tilde\nu = \frac{\nu}{c}$$
The wavenumber is convenient as it is directly porportional to the frequency and the energy of a photon via the Planck relation
$$E = h\nu = hc\tilde\nu$$
A first synthetic spectrum¶
We will use the Python library RamanSPy to generate a first synthetic spectrum. It shows peaks of intensity at certain wavelengths, that partially overlap creating this charactersitic spectrum. The generated spectrum does not have a backround signal. We will discuss this later in the lecture, but neglect it for the moment, asssuming that it has already been removed.
# Generate synthetic spectrum
spectra = rp.synth.generate_spectra(num_spectra=1, n_bands=1000, realistic=True, seed=10)
rp.plot.spectra(spectra)
rp.plot.show()
To analyse such a spectrum, we start with a simplifed notion fo a spectrum as a sum of Gaussian functions, each of which correspondis to an idealized peak of amplitude $1$. Letting $x$ be the Raman shift, a Guassian peak is a spectrum of the form
$$ f(x) = a \cdot e^{-(x-\mu)/(2 w^2)} $$
and we refer to $a$ as the amplitude, $\mu$ the mean, and $w$ the width. Let's code a function that creates a peak. For simplicity we will assume here that our spectra all are between 0 and 1000 $\text{cm}^{-1}$ and comprise of 1000 bins, i.e., the numerical resolution is $1 \text{cm}^{-1}$.
def peak(pos: float, width: float) -> np.ndarray:
# for simplicity we set spectral range and sample number
# to constants
x = np.linspace(0,100, 1000)
# Gauss function, with amplitude 1.
return 1 * np.exp( -(x - pos)**2 / (2*width**2) )
We can now create a simplified first spectrum comprising peaks at 50 and 75 of widths 1 and 3, respectively.
# simplified spectrum s1
s1 = peak(50,1) + 1/5*peak(75,3)
plt.figure()
plt.plot(s1)
plt.title("spectrum s1")
plt.xlabel("Raman shift")
plt.ylabel("Intensity")
Text(0, 0.5, 'Intensity')
Let's create a second
# simplified spectrum s2
s2 = 0.3*peak(20,2) + 1.3*peak(60,1)
plt.figure()
plt.plot(s2, 'r')
plt.title("spectrum s2")
plt.xlabel("Raman shift")
plt.ylabel("Intensity")
Text(0, 0.5, 'Intensity')
It's time to overlay both into a single spectrum. We assume the (bio-)chemical species responsible for spectrum $s_1$ is available in concentration $c_1 = 0.2$ and the species for $s_2$ in $c_2 = 0.5$.
def mix(spec: list[np.ndarray], coefficients: list[float]) -> np.ndarray:
ret = None
for i, s in enumerate(spec):
ret = coefficients[i] * s + (0 if ret is None else ret)
if ret is None:
raise ValueError("expecting at least 1 spectrum")
return ret
Resulting in the following spectrum (in black). The two original spectra are shown in blue and red.
mixed = mix([s1, s2], [0.2, 0.5])
plt.figure()
plt.plot(s1, 'b--')
plt.plot(s2, 'r--')
plt.plot(mixed, 'k')
plt.title("spectrum mixed")
plt.xlabel("Raman shift")
plt.ylabel("Intensity")
Text(0, 0.5, 'Intensity')
The problem we would like to solve is finding the two coefficients $c_1$ and $c_2$ back from the mixed spectrum, assuming that we know it is a mix of $s_1$ and $s_2$. We can phrase the problem as solving the least square problem
$$ \min_c \|Sc - m\|_2 $$
where
$$ S = \begin{bmatrix} {\vec s_{1}} & \vec s_{2} \\ \end{bmatrix} $$
$$ c = \begin{bmatrix} c_{1}\\ c_{n} \end{bmatrix} $$
and $m$ is the mexied spectrum.
def demix(components: list[np.ndarray], mixed: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
S = np.vstack(components).T
c, residuals, _, _ = np.linalg.lstsq(a=S, b=mixed, rcond=None)
return c, residuals
c, residuals = demix(components=[s1, s2], mixed=mixed)
print(f"The coefficients are {c} and the sums of squared residuals is {residuals}.")
The coefficients are [0.2 0.5] and the sums of squared residuals is [3.816829e-30].
So, indeed, we sucessfully found back the two concentrations.
More involved spectra¶
The above example was well behaved with respect to several properties:
- All peaks are well seperated and do not overlap. Neither between spectra, nor within a spectrum.
- We knew exactly which species are present.
- A perfectly clean spectrum was assumed: no background signal and no measurment noise is assumed.
We start by elevating (1) and (2). The next mixed spectrum has more involved spectra with overlapping peaks between and within a spectrum. Further, we are not sure if two other candidate species are present.
s3 = 0.3*peak(20,2) + 1.3*peak(60,1) + 1.2*peak(67,1) + 0.5*peak(70,4)
s4 = 0.2*peak(15,3) + 0.2*peak(62,1) + 1.0*peak(64,1) + 0.1*peak(80,1)
s5 = 1.0*peak(10,1) + 1.2*peak(40,1) + 0.1*peak(44,1)
s6 = 0.5*peak(16,1) + 1.0*peak(22,4) + 0.7*peak(80,2)
mixed = mix([s3, s4, s5, s6], [0.2, 0.5, 0.0, 0.0])
plt.figure()
plt.plot(s3, 'b--')
plt.plot(s4, 'r--')
plt.plot(s5, 'g--')
plt.plot(s6, 'y--')
plt.plot(mixed, 'k')
plt.title("spectrum mixed")
plt.xlabel("Raman shift")
plt.ylabel("Intensity")
c, residuals = demix(components=[s3, s4, s5, s6], mixed=mixed)
print(f"The coefficients are {c} and the sums of squared residuals is {residuals}.")
The coefficients are [ 2.00000000e-01 5.00000000e-01 -4.28245569e-17 -9.07151373e-17] and the sums of squared residuals is [2.00698394e-30].
Again, we found back the initial concentrations.
def make_noisy(spec: list[np.ndarray], factor=0.1) -> list[np.ndarray]:
ret = []
for s in spec:
gaussian_multiplicative_noise = np.random.normal(loc=1.0, scale=factor, size=spec[0].shape)
ret.append(s * gaussian_multiplicative_noise)
return ret
s3 = 0.3*peak(20,2) + 1.3*peak(60,1) + 1.2*peak(67,1) + 0.5*peak(70,4)
s4 = 0.2*peak(15,3) + 0.2*peak(62,1) + 1.0*peak(64,1) + 0.1*peak(80,1)
s5 = 1.0*peak(10,1) + 1.2*peak(40,1) + 0.1*peak(44,1)
s6 = 0.5*peak(16,1) + 1.0*peak(22,4) + 0.7*peak(80,2)
# make noisy
s3_noisy, s4_noisy, s5_noisy, s6_noisy = make_noisy([s3, s4, s5, s6])
# use the real / not noisy compounds to create the spectrum
mixed = mix([s3, s4, s5, s6], [0.2, 0.5, 0.0, 0.0])
# and make it noisy
mixed_noisy = make_noisy([mixed])[0]
plt.figure()
plt.plot(s3_noisy, 'b--')
plt.plot(s4_noisy, 'r--')
plt.plot(s5_noisy, 'g--')
plt.plot(s6_noisy, 'y--')
plt.plot(mixed_noisy, 'k')
plt.title("spectrum mixed, with noise")
plt.xlabel("Raman shift")
plt.ylabel("Intensity")
c, residuals = demix(components=[s3_noisy, s4_noisy, s5_noisy, s6_noisy], mixed=mixed_noisy)
print(f"The coefficients are {c} and the sums of squared residuals is {residuals}.")
The coefficients are [ 2.03043721e-01 4.78771439e-01 5.85396227e-05 -1.27394654e-04] and the sums of squared residuals is [0.20819475].
Still a very good match of coefficients.
Bacterial spectra¶
Following the tutorial on RamanSPy we next download a dataset of bacterial spectra obtained by Ho et al. (Nat Commun, 2019). Following the RamanSPy - loading bacterial data tutorial, we get:
# change this to the downloaded data dir
data_dir = r"data"
# get the training data
X_train, y_train = rp.datasets.bacteria("val", folder=data_dir)
y_labels, _ = rp.datasets.bacteria("labels")
spectra = [[X_train[y_train == species_id]] for species_id in list(np.unique(y_train))]
spectra_norm = rp.preprocessing.normalise.MinMax().apply(spectra)
# plot all spectra
plt.figure(figsize=(6.5, 9))
rp.plot.mean_spectra(spectra_norm, label=y_labels, plot_type="single stacked", title="bacterial spectra");
plt.figure()
idx_ecoli = 3
rp.plot.mean_spectra(spectra_norm[idx_ecoli], label=y_labels[idx_ecoli], plot_type="single stacked", title="bacterial spectra");
spectra_norm[idx_ecoli]
[<ramanspy.core.SpectralContainer at 0x15de8c470>]
# the two components
plt.figure()
ecoli_spectrum = spectra[idx_ecoli][0][0]
rp.plot.spectra(ecoli_spectrum)
plt.title("E. coli spectrum")
plt.figure()
calbicans_spectrum = spectra[0][0][0]
rp.plot.spectra(calbicans_spectrum)
plt.title("C. albicans spectrum")
# creating a mixed one
to_mix = [ecoli_spectrum, calbicans_spectrum]
cs = np.array([0.2, 0.5])
mixed_bac = rp.synth.mix(to_mix, cs, mixture_mode='linear', noise=False, baseline=False, seed=42)
plt.figure()
rp.plot.spectra(mixed_bac)
plt.title("mixed bacterial spectrum")
Text(0.5, 1.0, 'mixed bacterial spectrum')