R Code


The purpose of this page is to document a collection of R code chunks, which I have created to make analysing and visualising microbial community composition data easier within the R and phyloseq environment.

 

Splitting Proteobacteria


When plotting microbial community composition at the phylum level, it is common practice to represent proteobacteria at the class level. This is because of the enormous phylogenetic diversity constrained within the protebacteria phylum. Below is a quick function, which can be used to replace the phylum level classification of proteobacteria with the corresponding class level classification for each ASV. The function requires only your phyloseq object as an input. There are two iterations of the function, depending on whether your tax_table() is using silva or greengenes taxonomy.

Note that this function is exclusively to make plotting easier.

# Silva classification
split_proteo_silva <- function(physeq) {
  one = as.data.frame(tax_table(physeq))
  two = as.data.frame(lapply(one, as.character), stringsAsFactors = F)
  two$Phylum[two$Phylum=="Proteobacteria"] = two$Class[two$Phylum=="Proteobacteria"]
  two[] = lapply(two, factor)
  three = tax_table(two)
  rownames(three) = rownames(tax_table(physeq))
  colnames(three) = colnames(tax_table(physeq))
  tax_table(physeq) = three
  return(physeq)
}

# Greengenes classification
split_proteo_gg <- function(physeq) {
  one = as.data.frame(tax_table(physeq))
  two = as.data.frame(lapply(one, as.character), stringsAsFactors = F)
  two$Phylum[two$Phylum=="p_Proteobacteria"] = two$Class[two$Phylum=="p_Proteobacteria"]
  two[] = lapply(two, factor)
  three = tax_table(two)
  rownames(three) = rownames(tax_table(physeq))
  colnames(three) = colnames(tax_table(physeq))
  tax_table(physeq) = three
  return(physeq)
}

Below is an example of function use, and a method to double check that the function has worked correctly.

# create new phyloseq object with split proteobacteria
new.phy <- split_proteo_silva(old.phy)

# check that the tax_table of the new phyloseq object displays protebacterial classes in the phylum column
head(tax_table(new.phy))

# check that the tax_table of the new and old phyloseq objects are identical (should be TRUE), excluding the phylum column
identical(tax_table(new.phy)[1:100, 3:7], tax_table(old.phy)[1:100, 3:7])

 

Extract pairwise distance vector from matrix


When conducting beta-diversiy analyses of microbial communities, it can be useful to extract the pairwise distances between sites as a vector. One purpose of this would be to use a vector of pairwise distances to compare within-group distances between conditions, and subsequently analyse and plot these data. Below is a very simple function to extract only the bottom half of a distance matrix in the format generated through the vegan or phyloseq packages. This function does not extract the diagonal values of the matrix (all 0). The purpose of extracting only one half of the matrix is to avoid duplication of pairwise distances.

extract.matrix <- function(dist.matrix) {
  lower.tri(as.matrix(dist.matrix))
}