Chemical Reaction Networks¶
The course contains material derived from the course Biological Circuit Design by Michael Elowitz and Justin Bois, 2020 at Caltech.
The original course material has been changed by Matthias Fuegger and Thomas Nowak.
This lecture covers:
Concepts
- Chemical reaction networks (CRNs) to describe species and reactions among them
- Determinstic ODE kinetics of CRNs
- Stochastic Markov chain kinetics of CRNs
- Their link via the Langevin equation
Chemical Reactions Networks¶
A chemical reaction network is described by a set $\mathcal{S}$ of species and a set $\mathcal{R}$ of reactions. A reaction is a triple $(\mathbf{r}, \mathbf{p}, \alpha)$ where $\mathbf{r}, \mathbf{p}\in \mathbb{N}_0^{\mathcal{S}}$ and $\alpha\in\mathbb{R}_{\geq0}$. The species with positive count in $\mathbf{r}$ are called the reaction's reactants and those with positive count in $\mathbf{p}$ are called its products. The parameter $\alpha$ is called the reaction's rate constant.
The dynamical equations¶
For simple protein production, we have the following species
- DNA. We assume there is a single promoter followed by a protein-coding gene in the cell
- mRNA, where $m$ is the current number of mRNA corresponding to the above gene
- protein, where $p$ is the current number of proteins corresponding to the above gene
as well as the following reactions among them: \begin{align} \text{DNA} &\rightarrow \text{mRNA} + \text{DNA} &\text{(transcription)}\\ \text{mRNA} &\rightarrow \emptyset &\text{(mRNA degradation and dilution)}\\ \text{mRNA} &\rightarrow \text{protein} + \text{mRNA} &\text{(translation)}\\ \text{protein} &\rightarrow \emptyset &\text{(protein degradation and dilution)} \end{align}
Macroscale equations (deterministic, ODE semantics)¶
As we've seen before, the deterministic dynamics, which describe mean concentrations over a large population of cells, are described by the ODEs
\begin{align} \frac{\mathrm{d}m}{\mathrm{d}t} &= \beta_m - \gamma_m m, \\[1em] \frac{\mathrm{d}p}{\mathrm{d}t} &= \beta_p m - \gamma_p p. \end{align}
In general, the rate of a reaction $(\mathbf{r}, \mathbf{p}, \alpha)$ at time $t$ is $\alpha \prod_{s\in \mathcal{S}} s(t)^{\mathbf{r}(s)}$ where $s(t)$ is the concentration of species $s$ at time $t$. The differential equation for a species $s \in \mathcal{S}$ is: $$ \frac{ds}{dt} = \sum_{(\mathbf{r}, \mathbf{p}, \alpha) \in \mathcal{R}} (\mathbf{p}(s) - \mathbf{r}(s)) \cdot \alpha \prod_{\tilde{s}\in \mathcal{S}} \tilde{s}(t)^{\mathbf{r}(\tilde{s})} $$
The Chemical Master equation (stochastic, Markov chain semantics)¶
We can write a master equation for these dynamics. In this case, each state is defined by an mRNA copy number $m$ and a protein copy number $p$. States can transition to other states with rates. We assume that the state fully described the system state. That is the probability to transition from a state $(m,p)$ to a state $(m',p')$ with in an infinitesimal time $\Delta t$ is independent if how long our system already is in state $(m,p)$. It is approximately $\Delta t \cdot \gamma_i(m,p)$, where $\gamma_i(m,p)$ is the rate at which reaction $i$ happens if in state $(m,p)$.
The following image shows state transitions and their corresponding reactions for large enough $m$ and $p$. Care has to be taken at the boundaries, e.g., if $m = 1$ or $m = 0$.
Denote by $P(m, p, t)$ the probability that the system is in state $(m,p)$ at time $t$. Then, by letting $\Delta t \to 0$, it is
\begin{align} \frac{\mathrm{d}P(m,p,t)}{\mathrm{d}t} &= \beta_m P(m-1,p,t) & \text{(from left)}\\ &+ (m+1)P(m+1,p,t) & \text{(from right)}\\ &+\beta_p mP(m,p-1,t) & \text{(from bottom)}\\ &+ \gamma (p+1)P(m,p+1,t) &\text{(from top)}\\ &- mP(m,p,t) & \text{(to left)}\\ &- \beta_m P(m,p,t) & \text{(to right)}\\ &- \gamma p P(m,p,t) &\text{(to bottom)}\\ &- \beta_p mP(m,p,t)\enspace. & \text{(to top)} \end{align}
We implicitly define $P(m, p, t) = 0$ if $m < 0$ or $p < 0$. This is the master equation we will sample from using the stochastic simulation algorithm (SSA) also called Gillespie algorithm.
The Gillespie algorithm¶
Propensity¶
The rhs terms in the above equation are familiar to us: they almost look like reaction rates with a single difference of not being functions of concentrations, but of species counts. For example, a reaction $$ A + B \rightarrow C $$ with mass-action kinetics and rate constant $\gamma$ in units of $\text{L} s^{-1}$ has rate $$ \gamma \cdot [A] \cdot [B] $$ in units of $\text{L}^{-2} \cdot \text{L} \text{s}^{-1} = \text{L}^{-1} \text{s}^{-1}$. Concentrations may as well be given in molar units.
By contrast, the propensity of the reaction is $$ \gamma' \cdot A \cdot B $$ in units of $\text{s}^{-1}$, where $\gamma' = \gamma / \text{vol}$ is in units of $\text{s}^{-1}$ and vol is the volume of the compartment in which the reactions happen.
The propensity for a given transition/reaction, say indexed $i$, is denoted as $a_i$. The equivalence to notation we introduced for master equations is that if transition $i$ results in the change of state from $n'$ to $n$, then $a_i = W(n\mid n')$.
In general, the propensity of a reaction $(\mathbf{r}, \mathbf{p}, \alpha)$ at time $t$ is $$ \frac{\alpha}{v^{o-1}} \prod_{s\in \mathcal{S}} \binom{s(t)}{\mathbf{r}(s)} $$ where $s(t)$ is the count of species $s$ at time $t$, $v$ is the volume, and $o = \sum_{s\in\mathcal{S}} \mathbf{r}(s)$ is the order of the reaction. Its effect is subtracting $\mathbf{r}(s)$ from the count of species $s$ and adding $\mathbf{p}(s)$ to the count of species $s$.
Switching states: transition probabilities and transition times¶
To cast this problem for a Gillespie simulation, we can write each change of state (moving either the copy number of mRNA or protein up or down by 1 in this case) and their respective propensities.
\begin{align} \begin{array}{ll} \text{reaction, }r_i & \text{propensity, } a_i \\ m \rightarrow m+1,\;\;\;\; & \beta_m \\[0.3em] m \rightarrow m-1, \;\;\;\; & m\\[0.3em] p \rightarrow p+1, \;\;\;\; & \beta_p m \\[0.3em] p \rightarrow p-1, \;\;\;\; & \gamma p\enspace. \end{array} \end{align}
We will not carefully prove that the Gillespie algorithm samples from the probability distribution governed by the master equation, but will state the principles behind it. The basic idea is that events (such as those outlined above) are rare, discrete, separate events. I.e., each event is an arrival of a Poisson process. The Gillespie algorithm starts with some state, $(m_0,p_0)$. Then a state change, any state change, will happen in some time $\Delta t$ that has a certain probability distribution (which we will show is exponential momentarily).
transition probabilities¶
The probability that the state change that happens is because of reaction $j$ is proportional to $a_j$. That is to say, state changes with high propensities are more likely to occur. Thus, choosing which of the $k$ state changes happens in $\Delta t$ is a matter of drawing an integer $j \in [1,k]$ where the probability of drawing $j$ is
\begin{align} \frac{a_j}{\sum_i a_i}\enspace. \end{align}
transition times¶
Now, how do we determine how long the state change took? Let $T_i(m,p)$ be the stochastic variable that is the time that reaction $i$ occurs in state $(m,p)$, given that it is reaction $i$ that results in the next state. The probability density function $p_i$ for the stochastic variable $T_i$, is \begin{align} p_i(t) = a_i\, \mathrm{e}^{-a_i t}\enspace, \end{align} for $t \geq 0$, and $0$ otherwise. This is known as the exponential distribution with rate parameter $a_i$ (related, but not equal to the rate of the reaction).
The probability that it has not occurred by time $\Delta t$, is thus \begin{align} P(T_i(m,p) > \Delta t \mid \text{reaction } r_i \text{ occurs}) = \int_{\Delta t}^\infty p_i(t) \mathrm{d}t = \mathrm{e}^{-a_i \Delta t}\enspace. \end{align}
However, in state $(m,p)$ there are several reactions that may make the system transition to the next state. Say we have $k$ reactions that arrive at times $t_1, t_2, \ldots$. When does the first one of them arrive?
The probability that none of them arrive before $\Delta t$ is \begin{align} P(t_1 > \Delta t \wedge t_2 > \Delta t \wedge \ldots) &= P(t_1 > \Delta t) P(t_2 > \Delta t) \cdots = \prod_i \mathrm{e}^{-a_i \Delta t} = \mathrm{exp}\left(-\Delta t \sum_i a_i\right)\enspace. \end{align} This is the equal to $P(T(m,p) > \Delta t \mid \text{reaction } R \text{ occurs})$ for a reaction $R$ with propensity $\sum_i a_i$. For such a reaction the occurrence times are exponentially distributed with rate parameter $\sum_i a_i$.
The algorithm¶
So, we know how to choose a state change and we also know how long it takes. The Gillespie algorithm then proceeds as follows.
- Choose an initial condition, e.g., $m = p = 0$.
- Calculate the propensity for each of the enumerated state changes. The propensities may be functions of $m$ and $p$, so they need to be recalculated for every $m$ and $p$ we encounter.
- Choose how much time the reaction will take by drawing out of an exponential distribution with a mean equal to $\left(\sum_i a_i\right.)^{-1}$. This means that a change arises from a Poisson process.
- Choose what state change will happen by drawing a sample out of the discrete distribution where $P_i = \left.a_i\middle/\left(\sum_i a_i\right)\right.$. In other words, the probability that a state change will be chosen is proportional to its propensity.
- Increment time by the time step you chose in step 3.
- Update the states according to the state change you choose in step 4.
- If $t$ is less than your pre-determined stopping time, go to step 2. Else stop.
Gillespie proved that this algorithm samples the probability distribution described by the master equation in his seminal papers in 197690041-3) and 1977. (We recommend reading the latter.) You can also read a concise discussion of how the algorithm samples the master equation in Section 4.2 of Del Vecchio and Murray.
The Chemical Langevin Equation¶
The Chemical Master Equation can be difficult to solve, which is why several approxmations were developed. One of these approximations is the Chemical Langevin Equation, which we will now derive. It is perfectly useful on its own, in particular from a computational perspective. We present it here for another reason as well: to show that the ODE kinetics of a CRN are an approximation to the expected stochastic kinetics, at least for linear CRNs.
It starts
$$ X_i(t + \tau) = X_i(t) + \sum_{j=1}^M \nu_{j,i} K_j(X(t), \tau) $$ where $\nu_{j,i}$ is the net change in the count of species $i$ in reaction $j$ and $K_j(X(t), \tau)$ is the random variable that specifies how many times reaction $j$ occurs in the time interval $[t, t+\tau]$.
Assumption 1 of the chemical Langevin equation is that the propensity functions do not change significantly during the time interval $[t, t+\tau]$, i.e., $a_j(X(t')) \approx a_j(X(t))$ for all $t' \in [t,t+\tau]$.
Then, we can rewrite the above equation as $$ X_i(t + \tau) = X_i(t) + \sum_{j=1}^M \nu_{j,i} \mathcal{P}_j(a_j(X(t)) \cdot \tau) $$ where the $\mathcal{P}_j(a_j(X(t)) \cdot \tau)$ are independent Poisson variables with parameter $a_j(X(t)) \cdot \tau$.
Assumption 2 of the chemical Langevin equation requires the expected number of occurrences of each reaction to be big, i.e., $a_j(X(t))\cdot \tau \gg 1$.
Note that the two assumption require a tradeoff: assumption 1 wants $\tau$ to be small whereas assumption 2 wants $\tau$ to be big. It is very well possible that no choice of $\tau$ satisifies both assumptions for a given system.
It is well-known that the Poisson distribution with parameter $\lambda$ is well approximated by the normal distribution $\mathcal{N}(\lambda,\lambda)$ with expected value $\mu = \lambda$ and variance $\sigma^2 = \lambda$ for large values of $\lambda$.
Assumption 2 thus suggests using this approximation to get $$ \mathcal{P}_j(a_j(X(t)) \tau) \approx a_j(X(t))\tau + \sqrt{a_j(X(t))\tau} \cdot \mathcal{N}(0,1) $$ where $\mathcal{N}(0,1)$ is a standard normally distributed random variable.
Now, switching to $X(t)$ being real-valued and setting $\tau = dt$, in the limit $dt \to 0$, we finally get the chemical Langevin equation: $$ \frac{dX_i(t)}{dt} = \sum_{j=1}^M \nu_{j,i} a_j(X(t)) + \sum_{j=1}^M \nu_{j,i} \sqrt{a_j(X(t))} \Gamma_j(t) $$ where the $\Gamma_j$ are independent standard Gaussian white noise processes.
From Stochastic to Deterministic Kinetics¶
If the assumptions of the chemical Langevin equation hold sufficiently well, it is easy to make the connection between the stochastic and the deterministic kinetics of CRNs. In this case, taking the expected value (over an ensemble of sample paths) on both sides of the equation, we get: $$ \frac{d \mathbb{E} X_i(t)}{dt} = \sum_{j=1}^M \nu_{j,i} \mathbb{E} a_j(X(t)) $$
If the propensities follow a mass-action law, the expected value $\mathbb{E} a_j(X(t))$ is sometimes an approximation to $a_j(\mathbb{E} X(t))$. If there are only unary reactions, they are equal for instance.
The detour via the chemical Langevin equation is not the only way to show the above formula. For a large-volume (or large-species-count) limit, it can be derived directly from the Chemical Master Equation.