Putting ABC in Phylogenetic Comparative Methods

7 min readMay 29, 2021

Employing Approximate Bayesian Computation approach in Phylogenetic Comparative Methods

Phylogenetic Comparative Methods (PCMs)

When did humans develop the ability to walk on two feet, as compared to their ancestors? Did this happen because of a change in lifestyle from arboreal to terrestrial? If so, did this transition universally change all terrestrial animals? Did certain mechanisms (more than others) help humans develop bipedal-ism?

Phylogenetic comparative methods (PCMs) let us ask such questions.

What are PCMs? Let’s break it down.

In the context of evolutionary biology, the origin of all the extant and extinct species can be traced back to one common ancestor, billions of years ago. This historical relationship throughout evolution can be represented in the form of, what is called as, The Tree of Life (Figure 1). The study of how the organisms are related to each other in this tree of life is called phylogenetics.

Phylogenetic comparative methods(PCMs) are a set of tools — at the intersection of mathematics, biology and computer science — that extend from the field of phylogenetics. They allow us to compare different sets of species and populations throughout evolution and ask questions like — 1) what is the extent of similarity and/or dissimilarity between two or more groups of species?, 2) is an evolutionary process, contributing to the formation of new species, specific to certain organisms or can it be generalised? More specifically, one can ask questions like — did the ability to fly evolve among organisms once or multiple times throughout the history (figure 1)?

Conventional methods in PCMs

Using PCMs we try to infer the past given the information in hand. To do this we build models that allow us to test across hypotheses. These models, based on assumptions, are used to estimate evolutionary parameters that can best describe a given phenomenon.

One of the most common models that is employed for continuous traits — traits that can take on real values like brain size — is the single rate Brownian motion (BM) model.

Traits evolving under Brownian motion change randomly, in distance and direction, over a given time interval. Therefore, BM model assumes the change in trait to follow a Gaussian distribution with a mean zero and variance proportional to the length of time and evolutionary rate. Note how it models change in trait and not the trait value per se. Briefly, imagine a starting population with a mean trait value x. Any new species emerging from this ancestral population will have deviations from x, say x’, as a function of time and rate of evolution. In other words, the lesser the time since the new species diverged from the ancestor, the closer will be their trait values.

Based on the variance-covariance relationship among species one can calculate the unknown parameters like evolutionary rate for the BM model.

Owing to the inconsistent nature of evolution, it is common to observe certain traits evolving at a different rate in a group of species (group A) when compared to a distantly related group of species (group B). In such cases, a better alternative would be to employ a multiple rate BM model — a more complex model than the one with single rate. As the evolutionary processes get more and more complex, with an increasing number of unknown parameters, the conventional methods used in PCMs start becoming inappropriate to model. This particularly poses a problem where most of the parameters lack an analytical solution.

One way to work around this is to use a simulation based approach called The Approximate Bayesian Computation or ABC.

Approximate Bayesian Computation (ABC)

Before we get to the how of it, let’s understand ABC first.

Given a set of observed trait data T for species involved in the study and an assumed evolutionary model E with, let’s say, N parameters, the approximate Bayesian computation approach consists of the following steps (Figure 2):

Infer the prior probability distributions of all the N parameters for E.
For each simulation, generate a random set of values for N parameters from their respective prior distributions. Let these random set of values be denoted by R.
Using R, simulate trait data points S.
If S is similar to T, accept R.
Repeat these simulations till a large number of R values have been accepted.

Let’s revisit step 4 — if S is similar to T, accept R.

Wait what? If S is similar to T? What does it mean to be “similar”?

Also, what are the chances that you would get an exact replica of trait values in a simulation for all the species included in the analysis that are identical (or even similar) to the original data set?

What I mean to say is this — Imagine you have five mammals — East African Lion, Plains Zebra, Reticulated Giraffe, Hippopotamus and Humans (that being the star cast of the movie Madagascar, except the last mammal on the list). Let’s say we have the data for their brain sizes (denoted by T) as measured by biologists . Now we want to test models with associated parameters that can help us understand how brain size evolved in these animals. Let’s assume the analytical solution of the unknown parameters (let’s say N=3) is not possible to calculate. Now according to the ABC approach, we have to get to a set of parameters (R, chosen at random from their prior probability distributions) that after simulation, give us brain size values (denoted as S) which are very similar to T. Isn’t that too abstract to decide if S is similar to T?

Therefore, to be clear about what the similarity is, in step 4 is crucial.

One approach is to use a summary statistics like mean, variance, likelihood etc. to compare S (simulated trait data) and T (observed trait data). Comparing one value versus a large number of values is way easy.

(imagine including and comparing brain size S-T pair values for all the mammals in the analysis,

…and birds,

…and insects,

… and Archean and bacteria… okay, scratch the last two, they don’t have brains.

I mean, you get the point!).

But again, how similar is similar enough?

In other words, how much deviation between S and T can one tolerate?

One can introduce a measure called tolerance (t) such that the difference between S and T (measured by some distance index d) has to always be less than t in order to accept R.

There are a number of algorithms that can be used to describe the criterion for similarity. What we just saw above is called rejection sampling.

Here’s a food for thought before I end this section — How does one decide on a summary statistics appropriate for one’s analysis? The “Is there a better alternative?” issue. Go figure!

Tying it all together

So here we are with a set of R (parameter estimates) values accepted over simulations. Now what?

Remember we used ABC to estimate unknown parameters for trait evolution. We started with prior distributions, our expectations, our beliefs of what those parameters could be. And after getting a set of R values, what we get are the posterior distributions for the parameters. One of the most common measures calculated from the posteriors is the Maximum a posteriori probability estimate or MAP. It is the mode of the distribution that is used as the most likely estimate of the parameter. I would like you to revisit the last sentence and appreciate how easy it becomes to infer measures like “the most likely estimate”. This is possible only because Bayesian statistics let’s us deal directly in terms of probabilities. Furthermore, we also get the uncertainty associated with the parameter estimates.

So the recipe is — you have your belief (i.e prior), you are presented with some evidence (i.e trait values) and you update your belief (i.e posterior) after taking that evidence into account (i.e similarity between the observed and the simulated trait values) and you find the most probable estimate of the parameter. That is how we are in real life as well!

Okay, what else can we do with the posterior distributions?

One can use the posterior distributions to see how much they deviate from a value that one expects; let’s say a value under a null hypothesis. Not just that, one can even get posterior distributions for all parameters across different evolutionary models and compare them to find which model(s) best describe(s) the data.

To wrap this section up, we saw how parameters without analytical solutions can be approximated using ABC. There are other instances that ABC might help us with. For example, it is not possible to capture the entire species belonging to a group (just like how this post is not exhaustive for the topic it presents) or to have the observed trait data for all species one wishes to study. This is a problem known to reduce the accuracy with which the parameters are estimated. ABC comes in handy in this situation since it depends on comparing the summary statistics of the observed and simulated trait data. And as long as the sampling is representative enough, the power to test across models remain uncompromised as well.

Looking ahead

In all, I feel the employment of ABC in PCMs is relatively rare, with an untapped potential waiting to be explored. For example, using this approach one can actually focus on studying different stages of an evolutionary process. This is different from what PCMs are usually used for — to study just the patterns of trait evolution. In this article, my intention was to give a connecting link between ABC and PCM, and introduce tid-bits to ponder upon later. In case anyone of you reading this article finds their interest in a specific topic, I could expand on it in my next posts.