How Do Recruitment deviations Work?

Author

Published

January 25, 2021

For an in-depth description of the theory and math behind recruitment deviations in assessment models, check out Methot, R.D., Taylor, I.G., Chen, Y., 2011. Adjusting for bias due to variability of estimated recruitments in fishery assessment models. Can. J. Fish. Aquat. Sci. 68, 1744–1760. https://doi.org/10.1139/f2011-092.

Here I want to jot down the practical implementation behind the recruitment deviation for those working in Stock Synthesis or building their own models from scratch. The main topics to discuss are 1) The meaning of a recruitment deviation (abbreviated here to “rec-dev”), 2) how it plays in your dynamics equations, and 3) keeping an eye out for lognormal bias correction.

What is a rec-dev?

A recruitment deviation is how much a given year’s recruitment deviates from what you’d expect given a stock-recruit relationship (in this post I’ll use the Beverton-Holt as an example). The idea is that there could be many reasons for a stock to produce fewer or greater recruits than you’d expect based on the current stock size. Reasons for this deviation could be environmental, and are generally unexplained – which is why the time series of rec-devs is often interpreted as process error in your system (all the extra natural noise our model can’t capture).

Let’s quickly refresh what we’re talking about when we say the “mean” or expected recruitment level. Again, using the Bev-Holt as an example:

\[ R_{expected} = \frac{SSB 4hR_0}{SSB_0(1-h)+ SSB(5h-1)} \]

\(h\) is steepness, or the expected proportion of \(R_0\) anticipated at \(0.2SSB_0\); \(R_0\) and \(SSB_0\) are virgin recruitment and spawning biomass, respectively. Given known (likely estimated) values for these parameters, the “mean” or expected recruitment level is what you’d get by plugging in those numbers to the values and reading off the curve:

Yet observed recruitment in a given year may deviate from the black curve. The new, observed recruitments are shown as gold points; the dotted line separating each point from the expected value is the raw value of the deviation itself (in numbers).

Let’s think for a second. If we deal in absolutes, the raw (absolute) value of a deviation at high biomass is going to be much larger than the deviation at low biomass. This isn’t very helpful if we’re trying to quantify the degree to which our stock’s recruitment in year \(y\) is diverging from expectation. For that reason, we are interested in modeling the errors, or deviations, as lognormally distributed. Equation-wise, here’s what that looks like. Note that \(R_{det}\) is simply the deterministic value of recruitment, which could come from a Bev-Holt, Ricker, you name it.

\[ R_{expected,y} = R_{det,y}e^{(dev_y-\sigma^2/2)} \]

Now we’ll quickly refresh the idea behind the lognormal distribution and explain what’s up with that \(-\sigma^2/2\).

Refreshing the lognormal distribution.

Check out (towardsdatascience.com)[https://towardsdatascience.com/log-normal-distribution-a-simple-explanation-7605864fb67c] for a good background read.

If our deviations vector \(\vec r\) is normally distributed with mean zero, \(exp(\vec r)\) is lognormally distributed, shown in blue below:

If \(r_y \sim N(\mu=0,\sigma=1)\) and \(Y = e^{r_y}\), the expected value of \(r_y\) is \(\mu\) (zero), and the expected value of \(Y\) will be \(e^{\mu+\sigma^2/2}\) (1.6487213), not \(e^{\mu}\).

Note that \(e^{r_y}\), the blue values above, are 1) never less than zero and 2) asymmetrically distributed. This means mathematically is that the expected value, or mean, of \(e^{r_y}\) (blue dashed line above) is in fact not the same as \(exp(mean(r_y))\). Confirm this for yourself below.

## should be close to 0
mean(X) 
[1] 0.04794604
## should be close to 1
exp(mean(X))
[1] 1.049114
## yet the mean of the blue distribution is larger...
mean(exp(X)) 
[1] 1.797997
##... in fact, the expected value of exp(X) is ~exp(mu+sigma/2)
exp(mu+sigma^2/2)
[1] 1.648721

That’s fine statistically, but makes for a problem if we calculate our annual recruitment by multiplying \(R_{det}\) by \(exp(r_y)\): we would end up with inflated average recruitments because we would be multiplying by values typically greater than 1.

Bias correction to the rescue

We set the mean for \(X\) to 0 in hopes that, on an “average” year, we’ll be multiplying \(R_{det,y}\) (deterministic recruitment in year y) by \(exp(0) = 1\), and get the expected value back. However, because we’ve discovered that \(exp(r_y)\) is not symmetrical, we need to perform a bias correction to confirm that we approximately return the \(R_{det,y}\) when \(r_y\) is zero. In other words, we back-transform the deviations to confirm that the mean of those deviations is roughly one, on a normal scale (after exponentiation).

Check it out:

## as above, the mean is close to 1 without adjustment
exp(mean(X)) 
[1] 1.049114
## ...though the actual mean is greater, by a factor of sigma^2/2
mean(exp(X)) 
[1] 1.797997
mean(exp(X-1^2/2)) ## with bias correction, closer to 1 again!
[1] 1.09054

One last thing. Normally we present a time series of rec-devs, and leave them in log space (to get around the scale issue I mentioned above).

Imagining we had a time series of data which encompassed all the different \(SB\) values corresponding to the gold points above, we could plot the deviations through time (I randomly reordered the values):

So now, the horizontal line at zero represents the deterministic or expected recruitment, and the points are the log deviations from that mean. Now you can interpret this plot intuitively, where values below zero were “worse than average years”, and vice versa. What’s more, because we are working in log space you can also get a quantitative sense of how divergent from expectation the recruits were in this year regardless of the stock size at hand.

Bonus: why this isn’t perfect

This came up on my General Exam for my PhD. I was asked to walk through the basic logic of the recruitment deviation and why we apply bias correction. Then I was asked, “why is this wrong?”.

Part of the answer lies in the fact that this is a conscious decision made by fisheries population dynamicists to make parts of our models behave in the way we want. Specifically, after applying our bias adjustment factor, the resulting recruitment timeseries will have a median value that is less than the deterministic curve shown above. In the Methot & Taylor paper linked up top, they mention that this is sensible since the average recruitment is most representative of a recruitment event’s long-term contribution to the population, “because most of the population biomass comes from the numerous recruits in the upper tail of the lognormal distribution”. If you’re working with a different data type or hope to gain information from the median behavior of this process, such an adjustment is not for you.

Bonus 2: A note on lognormal CVs

This came up during some work on VAST. f you have low log-SDs (i.e., <0.1), then the log-SD is basically identical to CV – log-SD is the SD of the index in log-space, and makes sense if you’re fitting the index as a lognormal distribution. Here’s how to convert a log-normal mean and standard deviation to a CV on the normal scale, otherwise.

Consider a random variable X that follows a lognormal distribution: \(X \sim Lognormal(μ, σ)\)

We know that when converting into non-log space, the mean of \(X\) should be \(μ_X = e(μ + (σ^2)/2)\) and the standard deviation is \(σ_X = \sqrt{(e^{σ^2}-1)} e^{μ + σ^2/2}\) (check the lognormal wiki page under “Arithmetic Moments” for details).

The coefficient of variation (CV) on the original scale is defined as the ratio of the standard deviation to the mean of the random variable X: \(CV_{original} = \frac{σ_X}{μ_X}\)

Via substitution:

\(\frac{\sqrt{(e^{σ^2}-1)}e^{μ + σ^2/2}} {e^{μ + σ^2/2}}\)

Cancel out the common term (\(e^{μ + σ^2/2}\)) in the numerator and denominator, leaving:

\(\sqrt{e^{σ^2}-1}\)