If my previous article was an introduction to the wonderful biology of the oceans, this one would be its counterpart into the theoretical side – understand, the mathematics! I know that this is much less glamourous for many people, but trust me, it can be as beautiful and interesting as the biology.

So today, I’ll try to cover the basics about what exactly I mean by ‘modeling’, and then share my personal thoughts about the one pitfall I try to avoid when doing it.

## Our very first model

In this blog, when I’ll talk about modeling, I’ll mostly refer to *mathematical modeling*. That means that if
I were you, I wouldn’t expect small trains and boats in bottles, unfortunately! That being said, I don’t think
that what I do is that far from that type of modeling.

Indeed, just as a hobbyist modeler takes a complex object, downsizes it and simplifies it until it can fit miniature train tracks or bottles, a mathematical modeler tries to take complex events, downsize them and simplify them using mathematical language and symbolism. We’ll have the occasion to discuss at length the different models I used or designed during my thesis, so let’s start with a simple but very useful example that is widely used across my field.

For this example, we’ll focus on the way heterotrophic bacteria (remember them?) eat. For now, I won’t go into the detail of *what* they
eat exactly and focus on the *how*, so we’ll say that on the one hand we have bacteria and on the other some vague
‘resource’ and leave it at that.

The first thing to figure out is exactly *what* do we want to model. Here, let’s say we have noticed that bacteria feed
differently depending on the abundance of resources in the environment, with the same bacteria feeding *faster* in richer
environments. This seems to make sense, but we want to be more precise regarding the relationship between
the *feeding rate* of our bacteria and the *environment richness*.

But before we even think about modeling this relationship, we need at least some sense of vague understanding of what
is going on, and that’s why we do experiments. After some thinking, we design the following protocol: we cultivate the same
bacterial population in multiple basins of ocean water kept at different constant levels of richness. We define environment richness by measuring the
*concentration* $C$ of our resource in the environment (meaning, how many grams of resources we find in one liter of ocean water). We then measure
the feeding rate $v$ of this population by measuring how much grams of
resource is consumed for every liter of ocean water in a single second.

We perform the experiment with 6 different concentrations, one with no resources and the others respectively at concentrations $C_1,C_2,C_3, C_4$ and $C_5$. For the population in an environment without resources, we find that bacteria will not feed, so we can note that for a concentration $C_0 = 0$, we find $v_0 = 0$. In the same manner, we measure the rates $v_1,v_2,v_3,v_4$ and $v_5$ for each concentration, and we represent the results in the following figure:

We then want to translate what we see in mathematical language, meaning we try to find an equation that links feeding rate
$v$ (in *grams* per *liter* per *second*, or $g\cdot L^{-1}\cdot s^{-1}$) to resource concentration $C$ (in $g\cdot L^{-1}$). We want to capture the fact that when
resource concentration increases, so does the feeding rate, and that a concentration of 0 implies a feeding rate of 0. We could for instance think of the
following *linear* relation:

$$ v = r \times C $$

with $r$ being a parameter we could measure.

To understand what’s going on here, let’s take an example. If we take $r=0.001$, this means that in one liter of sea water kept at $3~g$ of resource, our bacterial population will consume $3~mg$ of resource every second. We can verify that when $C$ increases, $v$ increases as well and that $C=0$ implies $v=0$, which is the behavior we tried to capture. But when we compare this model to the reality of the experiment, we find that apart from that, it doesn’t stick very closely to reality:

Here, the results from the experiment are represented in blue (as before), and the results that our model gives for each concentration are represented by the red line. We can clearly see that our model fails to capture a faithful representation of reality: it is true that when concentration increases, so does the feeding rate, but we can also see that the feeding rate reaches a maximum after a certain concentration, and does not increase any further. Think of it this way: when there is only one slice of pizza on the table, you can grab it fairly quickly and eat it (especially if you grew up with a lot of siblings!), whereas when there are dozens of slices, you can only grab a couple at a time to eat them, slowing you down.

As we can see, this behavior is not captured by our naive first model. That means that we must revise it! After some thinking, we come up with another (more complex) way of linking feeding rate $v$ to resource concentration $C$ with the following equation:

$$ v = v_m \times \frac{C}{K+C} $$

Where $v_m$ and $K$ are *constants*, meaning that they keep the same value across our experiments. $K$ is a rather arbitrary constant in this case,
but we can understand $v_m$ fairly easily: it is the maximum feeding rate for our bacterial population. When there is enough resources to sustain everyone,

$$ \frac{C}{K+C} $$

gets closer and closer to 1 (regardless of the value of $K$), meaning that $v$ gets closer and closer to $v_m$ as we increase concentration but without ever going past it.

When we represent our model with our observations:

We’re pretty close if you ask me! That means that we’ve successfully completed our first model. Hurray! And let me tell you, this is not just any old equation, this is the Monod Equation, which is currently used in a wide variety of sophisticated models, some used to predict climate change impacts on the oceans in the IPCC Reports.

You may still have some reserves about the model we developped together with some questions still unanswered. For instance, I told you that we were pretty close to reality with this model, that it was good enough, but who am I to say? Shouldn’t we strive for even more precision? You could also wonder how I decided to use this function involving $v_m$ and $K$. Why this one and not another? I must say, these are all very good questions, so let’s try to address them now.

## The most important question to have in mind when modeling

Once you know *what* you want to model, it is easy to lose sight of the big picture and immediatly jump to the question of *how*. In a way, that
is exactly what we did today, but I believe this is the wrong way to go about modeling in general. The best question to ask yourself before delving into the
*how* is *why?*

This may seem obvious, and in a way it is, but it is very easy to burn steps and try to rush to a first functioning model. Personally, I constantly have to make a conscious effort to ask myself this question, at every step of my work.

So let’s start with this good habit. Why in the first place would we want to model *anything*? I believe there are three main reasons that can urge
us to consider modeling.

*Description*: this is what we did today. We devised a model that helps us describe a natural behavior, but without delving into the mechanisms that lead to said behavior. For instance, our use of this particular equation involving $v_m$ and $K$ was motivated by its simplicity and adhesion to observations.*Explanation*: sometimes, merely describing what we see in mathematical terms isn’t enough, and we try to understand the mechanisms that lead to the observations. For instance, Lotka-Voltera equations do not merely describe the observed dynamics between preys and predators, but directly model the interactions between these populations. We can then compare the results of our model to the observations, and if they match, we can build confidence that our model captures at least some part of reality. The fact that the Lotka-Voltera equations lead to cycles is not baked in the model by the modeler, but is an*emergent property*of the system. And it so happens that these cycles can be observed in nature, hinting at the conformity of our model.*Prediction*: this may be the most straight-forward use of models in your mind. Models are indeed used to predict some really important things in everyday life, such as climate change or the spread of a pandemic (yes, we made it almost two whole articles without stating the obvious).

Of course, these three broad categories are intertwined. For instance, let’s go back to the models in the IPCC Reports: it uses the
model we discovered today (a *descriptive* model) nested inside a bigger model that is designed to *explain* the changes we’ve observed for decades, but of
course the end goal is to *predict* future changes.

Once we know *why* we want to model a certain system, we can start to work on the how. But this doesn’t mean that we’re completly done with the *why*! Indeed,
as I said I believe that we should ask ourselves this question at every step of the model. Why do I model this part of the system in a certain way and not
another? Why do I stop there and not make my model more precise?

These are all fundamental questions, and the answer may be a little frustrating: it all depends on the question you’re asking! A good rule of thumb would be to make the model as simple as possible while still capturing the phenomenon you’re studying. If we go back to our Monod equation, this rule can help us understand why we don’t strive to be more precise and capture every little variation we observed in our experiment. We aimed to capture the general trend of a growth reaching a plateau, and our equation does so with only two parameters (which is not a lot!). If it was good enough for Nobel Prize winner Jacques Monod, you can bet it’s good enough for me!

## Conclusion

There you have it, a ‘hands-on’ introduction to mathematical modeling in biology. But what comes after you have designed your model? Well, now comes the time for analysis, trying to understand what our model tells us about the world, and how it behaves in different conditions. If this part is certainly the most time-consuming aspect of my Ph.D., I truly believe that in terms of importance, analysis comes second to the modeling. Indeed, to adequatly model a system, one has to understand it, and that is the most exciting part of my job.

That is the wonder I’ll try to communicate when writing about models, this feeling that with just a few equations, we managed to capture a glimpse of reality. And if you’re still not a fan of the mathematics, worry not! In our next article, we’ll come back to the no less wonderful world of marine biology.

## If you want to go deeper

- Müller, J., & Kuttler, C. (2015). Methods and models in mathematical biology. Lecture Notes on Mathematical Modelling in the Life Sciences, Springer, Heidelberg, Germany. https://doi.org/10.1007/978-3-642-27251-6