# Homework: The lac Operon

## Background

The lactose (lac) operon of *Escherichia coli* is one of the most studied examples of gene regulation and serves as a paradigm for understanding how cells control gene expression in response to environmental signals. The operon consists of three structural genes (*lacZ*, *lacY*, and *lacA*) that encode enzymes for lactose metabolism, most notably β-galactosidase (LacZ), which cleaves lactose into glucose and galactose.

The regulation of the lac operon involves two key mechanisms that work in concert:

**Negative regulation (repression):** The LacI repressor protein constitutively binds to the operator region of the lac operon, physically blocking RNA polymerase from transcribing the lac genes. When lactose (or its metabolite allolactose) is present, it binds to LacI, causing a conformational change that releases LacI from the operator, allowing transcription to proceed. This mechanism ensures that the lac genes are not expressed when lactose is absent.

**Positive regulation (activation):** The CAP-cAMP complex (catabolite activator protein bound to cyclic AMP) binds near the promoter and enhances RNA polymerase binding and transcription initiation. The concentration of cAMP is inversely related to glucose availability—when glucose is abundant, cAMP levels are low; when glucose is scarce, cAMP levels rise. This mechanism ensures that even if lactose is present, the lac genes are only highly expressed when glucose (the preferred carbon source) is depleted.

This dual control creates a sophisticated logic: the lac operon is highly expressed only when lactose is present AND glucose is absent. This allows *E. coli* to preferentially use glucose when available and only invest in lactose metabolism machinery when it is both needed and advantageous. The phenomenon where glucose presence reduces lac operon expression even in the presence of lactose is called **catabolite repression** or **glucose repression**.

Understanding this system requires considering the binding kinetics of regulatory proteins to DNA, the stochastic nature of molecular interactions at low copy numbers, and the mathematical approximations that allow us to model gene regulation efficiently.

---

## Problem 1: Building and Simulating a Lac Operon Model

### Part (a): Model Development and Parameterization

Construct a Chemical Reaction Network (CRN) model of the lac operon that includes repression, activation, and induction. Your model should include the key molecular species (at minimum: DNA operator site, LacI repressor, inducer, CAP-cAMP complex, mRNA, and protein) and define the reactions between them. These reactions should capture LacI binding and unbinding to the operator, inducer binding to LacI, CAP-cAMP binding to DNA, transcription with state-dependent rates, translation, and degradation of mRNA and protein.

Find parameters from [Yildirim & Mackey](https://www.sciencedirect.com/science/article/pii/S0006349503700137). This paper provides a comprehensive parameter set. You may also consult the [BioNumbers database](https://bionumbers.hms.harvard.edu/) for typical *E. coli* parameters such as mRNA half-life (around 2-5 minutes), protein dilution rate (growth rate ≈ 0.02 min⁻¹ for doubling time of around 35 minutes), and typical transcription rates (around 40-60 nucleotides/second).

Create a table listing all reactions with their rate laws, parameter values (with units), and citations. For example, LacI typically exists at around 10-40 tetramers per cell, and the dissociation constant for LacI-operator binding is approximately 10⁻¹⁰ M.

Document the key assumptions you made in constructing your model. Which reactions did you assume are in equilibrium versus kinetically controlled? Assume a typical *E. coli* cell volume of 1 fL (10⁻¹⁵ L). Did you model LacI as a monomer or account for its tetrameric structure? Did you explicitly model all three operator sites (O1, O2, O3) or use a simplified single-operator representation? What biological details did you simplify and why?

Provide a clear diagram of your reaction network, a complete parameter table with references, and a discussion of your key modeling assumptions.

---

### Part (b): Demonstrating Regulation Through Simulation

Design and execute simulations that demonstrate how the lac operon responds to different environmental conditions. Simulate these four scenarios:

| Scenario | Inducer (IPTG) | Glucose | cAMP level | Expected Outcome |
|----------|----------------|---------|------------|------------------|
| 1 | Absent (0 µM) | Present | Low | Full repression |
| 2 | Present (1 mM) | Present | Low | Partial expression |
| 3 | Present (1 mM) | Absent | High | Full activation |
| 4 | Absent (0 µM) | Absent | High | Basal expression |

You can model glucose/cAMP effects by varying the concentration or activity of the CAP-cAMP complex. When glucose is present, set CAP-cAMP to a low value (e.g., 1 µM); when glucose is absent, set it to a higher value (e.g., 10 µM). For inducer, IPTG (isopropyl β-D-1-thiogalactopyranoside) is commonly used experimentally at concentrations of 10 µM to 1 mM.

For each scenario, run your model for at least 200 minutes (approximately 5-6 cell generations assuming a 35-minute doubling time) to ensure the system approaches steady state. Create time course plots showing mRNA copy number and protein (β-galactosidase) copy number versus time. Also make a bar chart comparing final steady-state expression levels across all four conditions.

Discuss the outcomes of your simulations. Compare your predicted protein copy numbers to experimental measurements. Fully induced cells typically produce 1000-5000 β-galactosidase molecules per cell, while repressed cells have fewer than 10. Does your model reproduce the ~1000-fold dynamic range observed experimentally? Does the catabolite repression effect (scenario 2 vs 3) show the expected 5-10 fold reduction? What are the implications for cellular metabolism—why does the cell benefit from this regulatory logic?

---

### Part (c): Deterministic vs. Stochastic Comparison

Compare deterministic (ODE) and stochastic (Gillespie SSA) simulations to understand the role of molecular noise in the lac operon. Select scenario 2 or 3 from part (b). Run a deterministic simulation using numerical integration of ODEs. Run a stochastic simulation using the Gillespie algorithm with 100 independent trajectories starting from the same initial conditions.

Create comparison figures that overlay the deterministic trajectory with 10-20 randomly selected stochastic trajectories to visualize the noise. Also show a histogram of the steady-state protein copy numbers from all 100 stochastic runs, and compare this to the single deterministic steady-state value.

Calculate the coefficient of variation (CV = σ/μ, where σ is standard deviation and μ is mean) and the Fano factor (F = σ²/μ, which equals 1 for Poisson processes) to quantify the noise level in protein expression.

Analyze your observations. Stochastic effects are most significant when molecule copy numbers are low. Note that mRNA typically has 1-10 copies per cell while proteins may have 100-1000 copies. You should observe larger relative fluctuations in mRNA than in protein due to the difference in copy numbers. The protein acts as a "low-pass filter" that averages out fast mRNA fluctuations.

What are the implications for cell-to-cell variability? In a clonal population of genetically identical cells grown in the same environment, individual cells will show different expression levels due to stochastic effects. Would this noise be beneficial (allowing bet-hedging strategies) or detrimental (reducing precision of metabolic responses)? Consider that noise in the lac operon has been measured experimentally and shows CV values around 0.1-0.3 for protein levels.

---

## Problem 2: Understanding Dual Regulation

### Part (a): The Functional Logic of Activation + Repression

Analyze why the lac operon employs both activation and repression rather than a single regulatory mechanism. Using your model from Problem 1, calculate the fold-change in protein expression between the fully activated state (scenario 3: inducer present, glucose absent) and the fully repressed state (scenario 1: no inducer, glucose present). This is the **dynamic range** of the system.

Compare this to hypothetical systems where you modify your model to remove one mechanism:
- **Repression only:** Remove the CAP-cAMP activation effect by setting CAP-cAMP to a constant saturating concentration in all scenarios. Calculate the fold-change between induced (scenarios 2,3) and uninduced (scenarios 1,4).
- **Activation only:** Remove the LacI repression by setting LacI concentration to zero. Calculate the fold-change between high cAMP (scenarios 3,4) and low cAMP (scenarios 1,2).

The wild-type lac operon achieves approximately 1000-fold regulation. Compare this to what your hypothetical single-mechanism systems achieve. Does combining both mechanisms multiply the fold-changes, or is there a different relationship?

Create a truth table for the lac operon with two Boolean inputs (lactose present: yes/no, and glucose absent: yes/no) and one Boolean output (high lac expression: yes/no). The correct logic should show high expression only when lactose is present AND glucose is absent, implementing an AND logic gate. Verify that your simulation results match this logic.

Explain why expressing lac genes when glucose is available would be wasteful from an energetic and fitness perspective. Glucose catabolism through glycolysis yields more ATP per molecule and is more efficient than lactose metabolism. When glucose is present, the cell should focus exclusively on glucose utilization. Catabolite repression ensures this metabolic hierarchy. Additionally, synthesizing the lac proteins (each β-galactosidase monomer requires ~1000 amino acids) is costly in terms of ribosomes, amino acids, and energy.

Connect this to DNA occupancy states. The promoter can be in multiple states: free, LacI-bound (repressed), CAP-cAMP-bound (partially active), or both free of LacI and bound by CAP-cAMP (fully active). Under different nutrient conditions, what fraction of time does the promoter spend in each state? How does dual regulation affect these occupancy probabilities?

Provide a quantitative comparison of dynamic ranges in a table showing fold-changes for: wild-type, repression-only, and activation-only systems. Include the AND logic truth table and discuss the functional advantages of dual regulation.

---

### Part (b): Binding Kinetics vs. Equilibrium

Test whether binding and unbinding rates matter beyond their equilibrium ratio. Even when the dissociation constant $K_d = k_\mathrm{off}/k_\mathrm{on}$ is held constant, the absolute values of $k_\mathrm{on}$ and $k_\mathrm{off}$ should affect system dynamics, particularly in the time domain and under stochastic conditions.

Use your lac operon model and focus on LacI binding to the operator. In *E. coli*, the LacI-operator interaction is characterized by very fast association (diffusion-limited, $k_\mathrm{on}$ ≈ 10⁸ M⁻¹s⁻¹) and relatively slow dissociation ($k_\mathrm{off}$ ≈ 10⁻² s⁻¹), giving $K_d$ ≈ 10⁻¹⁰ M.

Define three parameter sets that all maintain $K_d$ = 10⁻¹⁰ M but vary the kinetic rates:
- **Fast binding:** $k_\mathrm{on}$ = 10⁸ M⁻¹s⁻¹, $k_\mathrm{off}$ = 1 s⁻¹ (K_d = 10⁻⁸ M, adjust for testing)
- **Medium binding:** $k_\mathrm{on}$ = 10⁷ M⁻¹s⁻¹, $k_\mathrm{off}$ = 0.1 s⁻¹
- **Slow binding:** $k_\mathrm{on}$ = 10⁶ M⁻¹s⁻¹, $k_\mathrm{off}$ = 0.01 s⁻¹

Note: To maintain the same $K_d$ = 10⁻¹⁰ M across all three cases while varying rates by 100-fold, adjust the rates proportionally: $k_\mathrm{on}$ = 10⁸ M⁻¹s⁻¹ with $k_\mathrm{off}$ = 0.01 s⁻¹ (fast), $k_\mathrm{on}$ = 10⁷ M⁻¹s⁻¹ with $k_\mathrm{off}$ = 0.001 s⁻¹ (medium), $k_\mathrm{on}$ = 10⁶ M⁻¹s⁻¹ with $k_\mathrm{off}$ = 0.0001 s⁻¹ (slow).

Run deterministic simulations starting with the repressor bound to the operator (repressed state). At t = 100 seconds, suddenly introduce saturating inducer (1 mM IPTG) which rapidly inactivates LacI. Track the mRNA and protein production as the system transitions from repressed to induced. Measure the **response time** defined as the time to reach 50% of the new steady-state protein level.

Do all three cases reach the same steady state? (They should, since $K_d$ is identical.) How do the response times differ? Fast unbinding (high $k_\mathrm{off}$) should allow quicker escape from the repressed state, while slow unbinding creates a lag.

Run stochastic simulations (50 trajectories for each parameter set) at steady state in the induced condition. Calculate the coefficient of variation (CV) for protein copy numbers. Does the noise level depend on the binding kinetics even though $K_d$ is the same?

Fast binding with high $k_\mathrm{on}$ and $k_\mathrm{off}$ means the repressor rapidly samples bound and unbound states, creating many brief transcriptional bursts. Slow binding with low $k_\mathrm{on}$ and $k_\mathrm{off}$ means longer dwell times in each state, creating fewer but longer bursts. These different temporal patterns can produce different noise levels even at the same mean expression.

Interpret your results biologically. Fast response times (high $k_\mathrm{off}$) are advantageous when nutrient availability changes rapidly, allowing the cell to quickly adapt. However, very fast binding/unbinding might increase noise. There may be a trade-off between response speed and expression precision. Slow binding kinetics can also create cellular "memory" where the current state depends on recent history, which could be beneficial or detrimental depending on environmental dynamics.

Discuss when equilibrium approximations (which only depend on $K_d$) are sufficient versus when explicit kinetic modeling (accounting for individual rate constants) is necessary. For steady-state predictions, equilibrium is often adequate. For dynamics, noise, and transient responses, kinetics matter.

---
**License**: © 2025 Matthias Függer and Thomas Nowak. Licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).