As genomic and ecological data sets grow larger in size, researchers are flooded with far more information than was available when many conventional model-based approaches were designed. To deal with these massive amounts of data, many researchers have turned to machine learning techniques, which promise the ability to help find signals within the noise of the complex data sets generated by modern sequencing approaches. Applications for machine learning in molecular ecology are broad and include global studies of biodiversity patterns, species delimitation studies, and studies of the genomic architecture of adaptation, among many others. Here at Molecular Ecology Resources, we are excited to highlight research that applies supervised and unsupervised machine learning algorithms to answer questions of interest to the readership of molecular ecology. This special issue will also highlight the nuances and limitations of machine-learning techniques. Rather than focusing on the supposed differences between machine-learning and model-based approaches, this issue would aim to highlight the broad spectrum of machine-learning approaches, many of which can incorporate model-based expectations and predictions.
We are soliciting original research
that applies novel robust applications of machine learning methods on molecular
data to address questions across ecological disciplines.
Manuscripts should be submitted in the usual way through the Molecular Ecology Resources website. Submissions should clearly state in the cover letter accompanying the submission that you wish the manuscript to be considered for publication as part of this special issue. Pre-submission inquiries are not necessary, but any questions can be directed to: firstname.lastname@example.org
Special issue editors: Nick Fountain-Jones, Megan Smith & Frédéric Austerlitz
Individuals within a species vary, and this variation can have important implications for the role a species may play within ecosystems. We compared the relative importance of variation within species due to genetic changes within its own genome versus symbiotic interactions between the focal species and its associated bacteria, also called their microbiome. We focused on Microcystis aeruginosa, a globally distributed photosynthetic cyanobacterium, also known as blue-green algae, that often dominates freshwater harmful algal blooms.
These blooms have recently become more common and intense worldwide, causing major economic and ecological damages. We studied Microcystis and their associated microbiomes from lakes in Michigan, USA that vary in phosphorus content, which is the primary limiting nutrient in lakes. We found genomic changes among strains of Microcystis along this phosphorus gradient that indicated increased efficiency in the use of phosphorus and nitrogen. Intriguingly, we found that genotypes adapted to different nutrient environments co-occurred in phosphorus‐rich lakes. This co-occurrence may have critical implications for understanding how Microcystis blooms persist for many months, long after nutrients become depleted within lakes. Similar to previous findings in for example the human microbiome, we uncovered that the bacteria comprising the microbiomes of Microcystis varied in community composition but were more stable at the level of functional contributions to their hosts across the phosphorus gradient. Finally, while our work was mostly focused on unraveling the genomic underpinnings of nutrient adaptation, we also observed consequences of these differences in Microcystis genome and microbiome composition at a physiological level. In particular, when nutrients were provided in abundance, Microcystis (and its microbiome) that had evolved to thrive in low-phosphorus environments could not grow as rapidly as strains from high-phosphorus environments.
– Sara Jackrel, Postdoctoral Fellow, University of Michigan.
What is the unit of conservation? Is it similar for different types of plants? How does the reproductive biology of the organism can inform the best practices in conserving threatened species? In her Doctoral research, Nicole Bezemer is studying Eucaliptus species from South Western Australia to better understand population dynamics in long-lived organisms and how this can lead to better management of their populations. Surprisingly, many of the small and fragmented populations of the two subspecies of E. caesia she studied are genetically differentiated at a fine spatial scale, and high levels of heterozygosity persists even in populations with a dozen of individuals. Nicole and colleagues suggest the clonal and perennial nature of E. caesia might contribute to these unusual patterns of genetic diversity and divergence, and suggest that traditional conservation genetic approaches might be detrimental for naturally fragmented species with these life-history characteristics. Read here about her experience in developing this research.
What led to your interest in this topic / what was the motivation for this study? Eucalyptus caesia is an intriguing study species, given the combination of a distribution on scattered granite outcrops, a long history of geographic and genetic insularity, a capacity for individual longevity via lignotuber re-sprouting, a lack of recent recruitment in most known stands, and adaptation for pollination by nectarivorous birds. After completing my Honours research at the Boyagin stand of E. caesia, I was hooked. The present study came into fruition upon discovering that one of my PhD experiments, involving 6 months of controlled cross-pollinations, was killed by a series of frosts. I had already genotyped two large stands of E. caesia and I was curious about what patterns of genetic structure might exist in other stands, and across the species’ landscape distribution.
What difficulties did you run into along the way? Some stands of E. caesia are located on immense granite outcrops, often hidden in hard-to-access gullies or behind thick barricades of vegetation. The first challenging aspect of the project was to find the sub-populations of E. caesia at each new location. For many populations, I did so by embarking on a Google Earth tour led by my supervisor, Steve Hopper, who has worked on the granite outcrop flora of south-west Australia and on E. caesia for nearly four decades. Nonetheless, I spent many hours traversing granite outcrops, sometimes in circles, which occasionally led to finding additional plants or, in the case of the E. caesia at Old Muntadgin, a previously undocumented population of several hundred plants.
What is the biggest or most surprising innovation highlighted in this study? I was surprised by the apparent lack of genetic interconnection between some stands over relatively small spatial scales. Given the long history of population fragmentation and reproductive biology of E. caesia (multiple modes of reproduction and gravity-dispersed seed), I anticipated that high levels of genetic differentiation would feature. Regardless, it was surprising to find that, in some instances, the level of genetic differentiation within stands exceeded that among stands. Another interesting result revealed by comprehensive genotyping were some very small census population sizes. Seven stands were comprised of fewer than ten unique multi-locus genotypes, and three locations had only one or two genotypes. Localised clonal reproduction is clearly of paramount importance to the persistence of these stands.
Moving forward, what are the next steps in this area of research? The next step is to further test the genetic integrity of the two subspecies, E. caesia subsp. caesia and E. caesia subsp. magna, by genotyping plants from additional stands. Walyamoning and Yanneymooning are geographical outliers to other stands of subsp. caesia and occur within relatively close proximity to the group of subsp. magna populations located in the north-east of the species distribution. We propose to genotype a sample of individuals from the two outlier populations of subsp. caesia stands, and at three additional locations of subsp. magna, to test whether the two subspecies are genetically distinct even when populations are sympatric, and to determine if hybridisation has occurred.
What would your message be for students about to start developing or using novel techniques in Molecular Ecology? My message to other young or early-career researchers is to have a clear research outcome in mind before exploring the application of novel techniques. Avoid putting yourself in the position of having to come up with a hypothesis after the fact.
What have you learned about methods and resources development over the course of this project? Comprehensive genotyping at multiple spatial scales may provide a more complete picture of spatial genetic structure compared to studies where sampling efforts are focused on few individuals from many populations, or on many individuals from few populations. There is still much to be gained from population genetic studies, especially in understudied, biodiverse, endemism hotspots such as granite outcrops, and in understudied systems such as small, historically fragmented populations of long-lived trees.
Describe the significance of this research for the general scientific community in one sentence. Anciently fragmented plant populations may be adept at persisting as small populations with low genetic diversity and limited genetic interconnection, and therefore attempts to connect such populations may be ineffective or even harmful.
Describe the significance of this research for your scientific community in one sentence. Small populations of long-lived woody perennial plants, even those comprising a handful of individuals, may contain unique genotypes that contribute to overall species genetic diversity, and are worthy of conservation.
What are Hill Numbers? What do they have to do with estimating biodiversity? How can you use them as a Molecular Ecologist? Read the recent review in Molecular Ecology Resources by Antton Alberti and Thomas Gilbert on this topic, and read the interview with Antton below to learn how they think about Hill numbers and their applications to metabarcoding. Also, check hilldiv, “an R package to assist analysis of diversity for diet reconstruction, microbial community profiling or more general ecosystem characterisation analyses based on Hill numbers, using OTU tables and associated phylogenetic trees as inputs. The package includes functions for (phylo)diversity measurement, (phylo)diversity profile plotting, (phylo)diversity comparison between samples and groups, (phylo)diversity partitioning and (dis)similarity measurement. All of these grounded in abundance-based and incidence-based Hill numbers.”
What led to your interest in this topic / what was the motivation for this study? Measuring, estimating and contrasting biological diversity are central operations in most ecological studies. In the last decades, dozens of diversity indices and metrics have been proposed, each with their individual strengths and weaknesses, and specific mathematical assumptions. The measures that many of them yield are difficult to interpret, because the values might refer to abstract units, which lack an straightforward interpretation for non-specialists. We believe that the statistical framework developed around the Hill numbers overcomes many of these problems, and provides a statistical toolset that is extremely useful for ecologists. Besides, Hill numbers enable incorporating complementary information, such as phylogenetic dissimilarities across organisms, which are really handy for molecular ecologists who can easily build phylogenetic trees from metabarcoding data.
What difficulties did you run into along the way? We are a molecular ecologist and an evolutionary biologist that use many different mathematical tools, but are not expert mathematicians. Hence, of the main challenges was to make sure that all the statements and mathematical interpretations were correct!
What is the biggest or most surprising innovation highlighted in this study? The aim of our review was to demonstrate to ecologists, who like us might have a limited mathematical background, that implementing the framework developed around the Hill numbers is not difficult, and has big potential gains. In our review we gathered information and tools generated by others, mainly Lou Jost, Anne Chao and Chun-Huo Chiu, and displayed them in a comprehensive way for molecular ecologists. We have tried to explain complex mathematical formulations in layman terms, exactly as we would like others to explain us other contents we are not familiar with. We have provided examples and pieces of code, that we hope will encourage other researches to use these tools.
Moving forward, what are the next steps in this area of research? Our article mainly focuses on diversity measurement from data generated using DNA metabarcoding. While bioinformatic methods to generate metabarcoding data have received much attention in the last decade, the impact of the statistical approaches used to analyse diversity has been less studied. Assessing their impact and providing guidelines for selecting the tool best suited to address specific questions with specific types of data, will be an important next step in the area of metabarcoding-based diversity analyses.
What would your message be for students about to start developing or using novel techniques in Molecular Ecology? Despite the fact that they might at first seem complex and abstract, bioinformatic and statistical tools are necessary to address ecological questions. Hence, we would encourage students to try to understand the basic bioinformatic and statistical procedures, so as to be able to select the best tools to address their research questions.
What have you learned about methods and resources development over the course of this project? That its not the most broadly-employed tools that are always the best way to address scientific questions!
Describe the significance of this research for your scientific community in one sentence. Hill numbers provide powerful, solid and versatile tools with which to carry out most of the analyses that are needed to assess biological diversity within a common statistical framework.
In this study, Foote et al. study the complex demographic history of killer whales and show how episodic gene flow is ubiquitous in their natural populations. This observation adds to the incresing recognition that the traditional geographical characterization of populations (i.e., allopatry, parapatry, and sympatry) is dynamic over time. Although in general it is difficult to perform deep sampling across the range of a species, cut through artificial taxonomic boundaries, and access enough genomic resources for a taxon, their journey is a great example as to how to do this, and how powerful population genetic methods can reveal the history of vagile and amply distributed species on earth.
What led to your interest in this topic / what was the motivation for this study? I’ve been working together with Phil Morin at Southwest Fisheries Science Centre for the last ten years, using genetic data to try and unravel the complex demographic and evolutionary history of killer whales. Some of the key questions have been, whether killer whale ecotypes arose from independent founder events and secondary contact, or through gradual divergence in sympatry. This study started out trying to model those processes (in collaboration with Laurent Excoffier) using genomes we had previously sequenced for a subset of the well-described killer whale ecotypes. We struggled to find a good model to fit the data, and it eventually became clear that we just had too few pieces of the jigsaw to be able to see the complete picture. We decided to cast a wider net and looked back at our previous global study published in Molecular Ecology in 2015, to select a dataset of samples that was representative of the global genetic variation in killer whales for genome sequencing. Having worked in the Centre for GeoGenetics, Copenhagen and the CMPG, Bern – both largely focused on human genetic variation, and being keen follower of that literature, it was a great opportunity to apply methods developed in that field on the killer whales.
What difficulties did you run into along the
Arguably, the biggest hurdle to
overcome was bringing clarity to the very complex relationships between these
killer whale populations. This was exacerbated by trying to include too many
analyses in earlier drafts. We had a draft manuscript ready almost a year ago,
which consisted of two parts: the demographic and evolutionary history of these
populations; and the genomic consequences of these different demographic histories.
However, this manuscript had become a behemoth! Thankfully, Jochen Wolf, one of
the first coauthors to tackle a full read-through of this weighty tome, suggested
this might be better digested in separate sittings. So the paper became focused
on the evolutionary history and hopefully is an easier read…thanks to Jochen.
What is the biggest or most
surprising finding from this study?
The ghost ancestry in the Antarctic
types, which was something I had suspected we might find, was only really
possible to test for due to methods being released as we were writing up the
paper. Clearly, we weren’t the only ones thinking along these lines, as several
other studies on species including seabass and bonobos released similar
findings of ghost ancestry around the same time – this is really nicely
highlighted in the perspective by Jacobs and Therkildsen, in the same issue of
Moving forward, what are the next steps for this
A key interest is how variation in
the genomic architecture, principally local recombination rate, influences the
frequency of different ancestry components within a population and how that
relates to past demographic history. As eluded to above, we have results on the
impacts of these complex demographic histories in a study we are just finishing
up. As a follow up, we will explore further the history of the ghost ancestry,
to find out if it conveys any benefits (adaptive variants) or costs (mutation
load), such as we see in Neanderthal ancestry in modern humans. And
ultimately we hope to better understand the underlying processes determining
the genetic differentiation between sympatric killer whale ecotypes.
What would your message be for students
about to start their first research projects in this topic?
I’d recommend having a
good understanding of the concepts, methods and models commonly used in
population genetics. I’ve been reading Matt Hahn’s Molecular Population
Genetics book and Graham Coop’s Population Genetic Notes, which is freely
available to download from Graham’s brilliant blog – gcbias.org. Often methods
will give seemingly contradictory results, and so it is important to be able to
understand how those analyses work to be able to puzzle out the different
signals from different methods. The two resources above will also help you
design your sampling scheme and plan your study out ahead of time, so that it
is best suited to the question you are trying to address.
What have you learned about science over
the course of this project? I
feel I’ve learned a lot. It has been a labour of love, the sequencing even
being partly funded by my Swiss pension scheme which I cashed in when I left
Bern. So, I didn’t feel like I had to please anyone but myself, and to be
honest, I thought it was such a complex story and quite species-focused that it
wouldn’t be of broad interest. But in fact, it is the paper that I’ve had the
most direct and positive feedback on from colleagues. So that has been both
surprising and satisfying. The lesson I take from that is to always try and
work on something that you are passionate about.
I also feel that as I was learning to better understand the methods and the analyses, I was trying to really hard to pass that on to the reader, assuming they may be as naïve as I was before I delved into this study. And based on the feedback, that is something that folk appreciate, and which makes the paper more intuitive and transparent. I have tried to expand upon this in a youtube video.
Describe the significance of this research for the
general scientific community in one sentence.
Genomes sequences are a record of
the many genealogies that comprise our ancestry. Our study highlights how a
relatively small number of genomes can reveal the complex relationship among
populations, past and present, across the globe.
Describe the significance of this research for your
scientific community in one sentence.
Our study highlights that marine
scientists need to consider connectivity through time, to past populations, as
well as space to better understand the genetic composition of present-day