Understanding the structure of an object of class Phylo

Mohak Sharda
5 min readJun 25, 2020

The R package ape is widely used for any analysis related to phylogenetics and comparative methods used to study evolution across organisms. The first thing required to carry out any such study is to store all the features that define a phylogenetic tree. ape does this by creating a special structure called ‘Phylo’.

What are these features? What is this structure? How are they stored? How do they look when visualised?

Let’s try and understand these things.

We will start with a top-down approach. Let’s look at a phylogenetic tree consisting of five individuals. For the sake of keeping things simple, let’s name the individuals as A,B,C,D and E.

Figure 1. A phylogenetic tree depicting the relationship between five individuals.

Let’s try and get a structure to build this tree.

We are going to code this in R.

Let’s define an empty variable called phylogenetic_tree of the type list().

phylogenetic_tree<- list()

Next, we are going to assign a class name ‘phylo’ to our variable. This will allow us to use the functions available in ape, that are applicable only on objects belonging to ‘phylo’.

class(phylogenetic_tree) <- “phylo”

You can imagine that our class ‘phylo’ is like a rulebook that will dictate all the features that any tree should have. Our object, in this example called phylogenetic_tree, will have those features by the virtue of it belonging to ‘phylo’ (as assigned above).

Let’s give our tree some structure now. We want to make a tree as depicted in figure 1. There are five individuals, also called as leaves of the tree. There are three internal nodes labelled 7, 8 and 9. There is one root for the entire tree i.e node number 6. It is the Most Recent Common Ancestor (MRCA) for the five individuals in our example. We are going to make a fully bifurcating rooted tree. Therefore, the root of the tree is at the internal node 6. It bifurcates into two children or descendent nodes 7 and 9. The internal node 7 bifurcates further into individual A and internal node 8.

Why are the direct descendents of the root not 7 and 8 instead? The answer lies in the way the traversal happens while reading a tree.

Stand at the root of the tree (level 1) facing the leaves and start moving along the path. After the path bifurcates, the traversal would dictate you to move to the right node (internal or tip) first and then to the left node. In our case, we first move from the internal node 6 to the internal node 7. Now, the latter (level 2) bifurcates again. Since on the right is a leaf i.e individual A, it terminates. It then moves to the left side which is an internal node. Therefore, in sequence this would be internal node 8 (level 3). This further bifurcates into two individual leaves B and C. There are no further levels left. It comes back to level 2. There is another internal node bifurcating from 6 on the left side that we will name as internal node 9. Finally, this terminates into two leaves E and D.

(If you read the previous paragraph without visualising each line on the tree as depicted in Figure 1, I would urge you to do that before proceeding ahead.)

Let’s code this in R, exactly the same way we traversed.

phylogenetic_tree$edge <- matrix(c(6,7,
7,1,
7,8,
8,2,
8,3,
6,9,
9,4,
9,5),8,2,byrow=TRUE)

The above command will result in a matrix with 8 rows and 2 columns. The rows represent the number of steps (formally called as edges) it takes to cover the entire tree from one node to another. The columns keep a track of the two nodes participating in each step. Remember, we are storing this as a feature of our variable phylogenetic_tree, an object of the class ‘phylo’. Specifically, we would call this feature using the syntax: phylogenetic_tree$edge.

Let us give our object some more properties. Let’s add a feature Nnode storing the number of internal nodes our tree would have. Let us label our internal nodes and tips(or leaves) as well.

phylogenetic_tree$Nnode <- 4
phylogenetic_tree$node.label <- c(6,7,8,9) #as per our traversal
phylogenetic_tree$tip.label <- c(“A”,“B”,“C”,“D”,“E”)

Let’s also assign some values to the edges. Remember, till now, we have only worked on the topology of the tree. We are yet to scale the edges. The length of each edge would represent either time since it diverged from the root of the tree or genetic distance in terms of changes in the genome of the individual(s) with respect to the ancestral node(s). A tree without any edge lengths would mean that the individuals along the tree are evolving independent of each other i.e there is no effect of a shared ancestory. (We will talk more on this in the subsequent posts.) We will see how ape represents this sort of a tree at the end of this post.

In figure 1, individual B has diverged the most. We will cap it to the maximum edge length of 1. And all the other edge lengths will be scaled down relative to it.

Let’s code this. Again, the sequence of assignment would follow the traversal path that we have been following so far. In other words, the order of rows in our matrix edge will decide the sequence of occurrence of edge lengths in our vector edge.length.

phylogenetic_tree$edge.length <- c(0.4,0.7,0.5,1,0.3,0.4,0.5,0.4)

Great! We have given our object phylogenetic_tree some nice properties as defined for the class ‘phylo’. This would help us in analysing our tree using other functions available in ape. I would cover those in the subsequent posts.

Finally, to plot phylogenetic_tree, we are going to use a function available in ape. Let’s install and load the package first and then use it’s function plot.phylo() to plot our tree as depicted in figure 1.

install.packages(“ape”)library(ape)plot.phylo(tree,show.node.label = TRUE)

This function, by default, only shows the tip labels. The plot.phylo() option show.node.label has to be set to TRUE in order to label the internal nodes as well (as seen in Figure 1).

For more information on different functions and their related options in ape, you can check out it’s extensive documentation:

https://www.rdocumentation.org/packages/ape/versions/5.3

Now coming back to how ape represents a tree without any edge length. Let’s write the code again, this time without making an edge.length vector:

ptree <- list()class(ptree) <- “phylo”ptree$edge<-matrix(c(6,7,
7,1,
7,8,
8,2,
8,3,
6,9,
9,4,
9,5),8,2,byrow=TRUE)
ptree$Nnode <- 4ptree$node.label <- c(6,7,8,9)ptree$tip.label <- c(“Species_A”,“Species_B”,“Species_C”,“Species_D”,“Species_E”)library(ape)plot.phylo(ptree,show.node.label = TRUE)

For visualisation purposes, ape will depict it as shown in figure 2, even though we haven’t coded for any edge lengths. For example, for species E and species D, we can not say that both have equally diverged from their ancestor at node 9, even though it might seem like.

Figure 2: A phylogenetic tree representing a relationship of five species A- E not scaled according to their edge lengths.

That’s all for this post, folks. I will be covering other subtle concepts related to phylogenetics and phylogenetic comparative methods as a part of this series. If there is something you guys would specifically like to read/know, please let me know in the comments below. Happy reading!

--

--

Mohak Sharda

Statistician @ GSK | PhD | Bioinformatics | Computational Biology | Biostatistics | Machine learning | Data Science | Microbial & Cancer Genomics | Python | R