Summary from the authors: Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest

Biodiversity inventories can now be built by collecting and sequencing DNA from the environment, which is not only easier, faster and cheaper than direct observation, but also much more comprehensive and systematic. This gives in particular unprecedented access to little-known microbial diversity. Tapping these data to answer community ecology questions, however, can prove a daunting task, as classical statistical approaches often fall short of the size and complexity of molecular datasets. To uncover the spatial structure of soil biodiversity over 12 ha of primary tropical forest in French Guiana, we borrowed a probabilistic model from text analysis. After demonstrating the performance of the method on simulated data, we used it to capture the co-occurrence and covariance patterns of more than 25,000 taxa of bacteria, protists, fungi and metazoans across 1,131 soil samples, collected every 10 m – a dataset that led to a previous publication in Mol. Ecol. (Zinger et al. 2019). We find that, even though the forest plot is at first sight rather uniform, bacteria, protists and fungi are all clearly structured into three assemblages matching the environmental heterogeneity of the plot, whereas metazoans are unstructured at that scale. We then work though the practical problems ecologists may encounter using this approach, such as whether to use presence-absence or read-count data, how to choose the number of assemblages and how to assess the robustness of the results. Finally, we discuss the potential use of related methods in community ecology and biogeography, and argue that probabilistic models are a way forward for analyzing the ever-expanding amount of data generated by the field.

Left: Primary tropical forest understory on the plot where the data were collected, Nouragues ecological research station, French Guiana. Right: Spatial distribution of assemblages of co-occurring soil taxa (OTUs), obtained by Latent Dirichlet Allocation from OTU presence-absence information only, over a 12-ha plot of plateau forest sampled every 10 m (top); and two main axes of environmental variation over the same forest plot, derived from Airborne Laser Scanning (bottom). Bacteria, protists and fungi exhibit a spatial pattern matching the environmental heterogeneity of the plot: the blue, green and red assemblages match the terra firme, hydromorphic and exposed rock parts of the plot, respectively. In contrast, metazoans such as annelids can be shown to be spatially unstructured at that scale. Sampled locations are indicated by dark dots, and values have been interpolated between samples using kriging.

Full article:
Sommeria-Klein G, Zinger L, Coissac E, et al. (2020). Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest. Molecular Ecology Resour. 20:371–386.

Zinger, L., Taberlet, P., et al. (2019). Body size determines soil community assembly
in a tropical forest. Molecular Ecology, 28(3), 528–543.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s