Methods summary: Addressing (one of) the challenges of RADseq

Article by Evan McCartney-Melstad and Brad Shaffer from University of California at Los Angeles

RADseq is a great method for gathering genomic data to answer biological questions across many different scales, from phylogenetics to population and landscape genetics. It is fast, inexpensive, and requires no previous knowledge about the species’ genomic architecture. However, with this flexibility comes challenges. In this paper we develop and bench test an approach to address what may be the biggest RADseq challenge: how to choose the right sequence similarity threshold that defines whether two non-identical sequencing reads arose from the same or different genomic locations. This problem goes to the heart of evolutionary genetics— if two sequences are considered to be homologous, or derived from the same ancestral genomic location with subsequent modification through time, then they tell us a great deal about evolutionary history. If they are paralogous, and map to separate locations, then they lack that shared evolutionary history. Getting this straight is perhaps the single most important step in using genomic data for evolutionary inference.

Heat maps showing pairwise data missingness at clustering thresholds of 88% (a) and 99% (b). 

Studies that include relatively distantly related samples, such as those asking phylogenetic or biogeographical questions, should expect that homologous sequences will have diverged over time and therefore require lower similarity thresholds that allow for that divergence. However, if the threshold is set too low, paralogs will be falsely assigned to the same genomic locus, leading to problems ranging from inflated missing data rates to inaccurate measures of genetic diversity. Rather than relying on rough guesses that are preset in software packages, our approach attempts to balance these two competing forces by quantifying the relationship between pairwise genetic relatedness (as estimated directly from the data) and summaries of the RADseq dataset including pairwise data missingness and the slope of isolation by distance among samples. The relationship between pairwise genetic distance and pairwise data missingness is particularly informative—although some positive correlation is expected as mutations accumulate in enzyme restriction sites that RAD relies on, there is often a clear pattern of increased pairwise missingness that occurs when the most divergent homologous allelic variants begin to be erroneously oversplit into different presumptive loci. By explicitly looking for this breakpoint as a function of clustering threshold, researchers can choose a value that allows them to maximize the number of genomic regions recovered while minimizing the erroneous oversplitting of highly divergent, but homologous loci.

Citation: McCartney‐Melstad, E, Gidiş, M, Shaffer, HB. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour. 2019; 19: 1195– 1204. https://doi.org/10.1111/1755-0998.13029

Methods summary: Applying CRISPR to detect eDNA

Article by Molly-Ann Williams and Anne Parle-McDermott both from Dublin City University

We were challenged to design and build a simple and rapid species monitoring system. Why do we need such a system?  Biodiversity loss is at an all-time high and such a system would help to support the management and conservation of fish species within aquatic environments by acquiring knowledge of species distribution that traditionally is gained through visual detection and counting. These methods are expensive, time consuming and can lead to harm of the species of interest.    We decided that environmental DNA (eDNA) was the way to go but we had to solve the ‘PCR problem’ i.e., avoid having to do cyclical high temperatures as that would see us ending up with a costly, once-off device that would likely not be applied outside our lab.  This got us brainstorming and led us to a novel isothermal detection method, combining Recombinase Polymerase Amplification with CRISPR-Cas detection, which simplifies the adaptation of nucleic acid detection on to a biosensor device.

This innovative methodology utilises the collateral cleavage activity of Cas12a, a ribonuclease guided by a highly specific single CRISPR RNA, to detect specific species from eDNA. We proved it could work for eDNA by applying the technology to the detection of Salmo salar from eDNA samples collected in Irish rivers, where presence or absence had been previously confirmed using conventional field sampling. The beauty of this advance is that it can be applied to any species in the environment.  Not only does this assay solve the ‘PCR problem’, it is also is a better approach for distinguishing very closely related species.  We look forward to others in the field adapting it to their own favourite species of interest.  

Citation: Williams, M‐A, O’Grady, J, Ball, B, et al. The application of CRISPR‐Cas for single species identification from environmental DNA. Mol Ecol Resour. 2019; 19: 1106– 1114. https://doi.org/10.1111/1755-0998.13045

Interview with the author: Sociality, hyenas and DNA methylation

Adding of methyl groups to a DNA molecule or methylation has the interesting ability to alter the activity of a DNA segment without changing the sequence.  In this behind the scenes look, Zachary Laubach and colleagues test if this valuable biomarker is impacted by differences in hyena social status or other ecological factors early in life. What’s particularly impressive is that they garnered insights into methylation from a wild population. They find some surprising results, such as that high ranking mums can confer higher levels of methylation to their cubs that disappears when they get older. Why? Find out below and read the full article here.

Photo credit: Zach Laubach

What led to your interest in this topic / what was the motivation for this study? 

Across a broad taxonomic spectrum, social experiences, particularly those early in life, seem to have a profound impact on organisms’ development. The idea that during sensitive periods of development, social experiences and early life environment can have lasting impacts on the later life phenotype and health is known as the Developmental Origins of Health and Disease (DOHaD) hypothesis, and was formalized in the 1980s by epidemiologists, namely David Barker and his research on cardiovascular disease. Among social mammals, including humans and non-human primates, an individual’s social rank affects their behavior, physiology, and related health outcomes. For example, in humans, low socioeconomic status is widely recognized as a risk factor for cardiovascular complications and other chronic diseases. In non-human primates, low social rank is risk factor for elevated chronic stress and immune dysregulation. So, although we observe that social status affects biology, we still know little about how this all works. To better understand a potential mechanism for how early life environment affects biology, we investigated possible early environmental determinants of a molecular biomarker (DNA methylation) over the course of development in a population of wild spotted hyenas. Similar to many primates, hyenas live in groups organized by a social dominance hierarchy, and whether or not a hyena is born high or low ranking has lifelong consequences.

What difficulties did you run into along the way? 

In this study, we focused on measuring DNA methylation, which is generally of interest to researchers because it is responsive to environmental stimuli and associated with gene expression. Still, while spotted hyenas present a unique opportunity to investigate how various social experiences and ecological factors early in life are associated with biological characteristics later in life, there were no previous studies (at least of which we were aware) that measured DNA methylation in this species. In other words, this was not like working with a well characterized molecular biology model organism, like fruit flies or lab rats. In fact, when we were conducting our lab work there was no publicly available draft hyena genome. In our attempt to assess a potentially informative biomarker in hyenas, we measured multiple types of DNA methylation with varying degrees of success. Finally, the hyenas we study live freely in a large reserve in Kenya, so much of our data were observational and collected under a variety of field conditions making collection of samples non-trivial.

Photo credit: Zach Laubach

What is the biggest or most surprising innovation highlighted in this study? 

This work represents one of a handful of studies conducted in a wild population that measures DNA methylation to better understand how early life environment may influence organisms’ biology over the course of development. Taking advantage of our approximately 30 years’ worth of continuously collected data on individually recognizable hyenas from the Masai Mara Hyena Project, we not only amassed a particularly large sample size for a long-lived, wild mammal, but we were also able to compare patterns of DNA methylation at various stages of development with respect to multiple early life environmental factors. We found that being born to a higher-ranking mom corresponded with greater global DNA methylation in young but not older hyenas. One interpretation of this result is that high ranking moms confer some advantage to their cubs early in life, but that the effect of maternal rank per se is not evident in global DNA methylation of subadult or adult hyenas. We also found some associations between global DNA methylation and litter size, human disturbance, and prey availability in the year a hyena was born, and these associations were strongest in the youngest age group of hyenas.

Moving forward, what are the next steps in this area of research?

In our next steps we are working to understand whether specific types of early life social environments, like maternal care and how well socially connected an animal is within its group, correspond with variation in DNA methylation and adult stress. We are also utilizing more advanced techniques for measuring DNA methylation, so that we might home in on functional pathways that are involved in the development of an adverse stress phenotype. As part of our broader research agenda looking at general biological principles related to DOHaD hypothesis, we have also teamed up with epidemiologists to ask how social status in humans affects biology. In fact, we have recently published another a paper looking at the associations between maternal socioeconomic status and patterns of DNA methylation over the course of development in children who are part of the Project Viva pre-birth cohort study (check out the paper here).

Photo credit: Zach Laubach

What would your message be for students about to start developing or using novel techniques in Molecular Ecology?

This project was part of my PhD work, and from this experience I have learned just how fast molecular biology advances as a field. Given that this technology is constantly changing, it is critical to find mentors and collaborators with up-to-date expertise who are willing to support you. I was fortunate to work in a cutting-edge molecular laboratory, and to receive training from internationally recognized experts in Dr. Dana Dolinoy’s lab who specialize in studying DNA methylation. Additionally, in studies like these that involve large observational data sets and that aim to understand biological mechanisms, the value of clearly defined study questions, hypotheses and a complimentary analytical strategy cannot be understated. In my opinion, novel technology will not substitute for a thoughtful and well-planned analysis.

What have you learned about methods and resources development over the course of this project? 

Working in a novel system, like investigating DNA methylation in wild spotted hyenas, presents challenges and limitations that are unique from those encountered in laboratory settings and when working with model organisms. However, there are deep insights and rich perspective to be gained at the three-way interface between molecular biology, behavioral ecology and evolutionary biology from study populations with intact life histories and that are subject to natural selection. I have also learned that long-term field studies with uninterrupted data collection, like the Masai Mara Hyena Project, provide an invaluable resource and an unmatched opportunity to combine molecular techniques with vast collections of behavioral, demographic and ecological data. In addition, while long-term field studies represent a substantial investment of time and resources, they also present a chance for comparative research that can help elucidate basic biological principals that span taxa, like the DOHaD hypothesis. As such, I believe I have been fortunate to work with Dr. Kay Holekamp’s hyenas and that these types of long-term field studies are an asset to be prioritized and preserved.

Describe the significance of this research for the general scientific community in one sentence.

Social and ecological factors experienced early in life can correspond to changes in molecular biomarkers, like DNA methylation, that are detected over the course of development, and that may affect patterns of gene expression.

Photo credit: Zach Laubach

Describe the significance of this research for your scientific community in one sentence.

Findings from this research suggests that maternal rank, anthropogenic disturbance, and prey availability around the time of birth are associated with later life global DNA methylation in spotted hyenas, particularly in cubs.

Citation: Laubach, ZM, Faulk, CD, Dolinoy, DC, et al. Early life social and ecological determinants of global DNA methylation in wild spotted hyenas. Mol Ecol. 2019; 28: 3799– 3812. https://doi.org/10.1111/mec.15174

National-scale eDNA metabarcoding study reveals diversity patterns of plant pathogens and how they change with land use

Plant pathogens are a major factor in farming and forestry, and also play a key role in ecosystem health. Understanding pathogens at national scales is critical for appropriate prevention and management strategies and for a sustainable provision of future ecosystem services and agroecosystem productivity. Despite this, at present we have little knowledge of the diversity patterns of plant pathogens and how they change with land use at a broad scale.

Photo credit: Ian Dickie

In our study we show how land uses such as farming and plantation forestry affected the variety of plant pathogens in soil, roots and on plant leaves – and we show there are many more species of plant pathogens in land that’s been modified by pasture, cropping, and plantation forestry than there are in natural forest. The patterns of pathogen diversity are distinct from other microbes.

These are some of the first landscape level insights into these critically important communities including fungal, oomycete and bacterial pathogens in seemingly healthy ecosystems. Our results give scientists new insights into where pathogens exist, and how pathogen communities are structured.

Andreas Makiola and Ian Dickie (Bio-Protection Research Centre, New Zealand)

Read the full article here

As genomic and ecological data sets grow larger in size, researchers are flooded with far more information than was available when many conventional model-based approaches were designed. To deal with these massive amounts of data, many researchers have turned to machine learning techniques, which promise the ability to help find signals within the noise of the complex data sets generated by modern sequencing approaches. Applications for machine learning in molecular ecology are broad and include global studies of biodiversity patterns, species delimitation studies, and studies of the genomic architecture of adaptation, among many others. Here at Molecular Ecology Resources, we are excited to highlight research that applies supervised and unsupervised machine learning algorithms to answer questions of interest to the readership of molecular ecology. This special issue will also highlight the nuances and limitations of machine-learning techniques. Rather than focusing on the supposed differences between machine-learning and model-based approaches, this issue would aim to highlight the broad spectrum of machine-learning approaches, many of which can incorporate model-based expectations and predictions.

We are soliciting original research that applies novel robust applications of machine learning methods on molecular data to address questions across ecological disciplines.

Details

Manuscripts should be submitted in the usual way through the Molecular Ecology Resources website. Submissions should clearly state in the cover letter accompanying the submission that you wish the manuscript to be considered for publication as part of this special issue. Pre-submission inquiries are not necessary, but any questions can be directed to: manager.molecol@wiley.com

Special issue editors: Nick Fountain-Jones, Megan Smith & Frédéric Austerlitz

Intra-specific variation and the algal microbiome

Individuals within a species vary, and this variation can have important implications for the role a species may play within ecosystems. We compared the relative importance of variation within species due to genetic changes within its own genome versus symbiotic interactions between the focal species and its associated bacteria, also called their microbiome. We focused on Microcystis aeruginosa, a globally distributed photosynthetic cyanobacterium, also known as blue-green algae, that often dominates freshwater harmful algal blooms.

Colony of Microcystis aeruginosa from Gull Lake. Colony photographed by O. Sarnelle of Michigan State University and image prepared by John Megahan of University of Michigan.

These blooms have recently become more common and intense worldwide, causing major economic and ecological damages. We studied Microcystis and their associated microbiomes from lakes in Michigan, USA that vary in phosphorus content, which is the primary limiting nutrient in lakes. We found genomic changes among strains of Microcystis along this phosphorus gradient that indicated increased efficiency in the use of phosphorus and nitrogen. Intriguingly, we found that genotypes adapted to different nutrient environments co-occurred in phosphorus‐rich lakes. This co-occurrence may have critical implications for understanding how Microcystis blooms persist for many months, long after nutrients become depleted within lakes. Similar to previous findings in for example the human microbiome, we uncovered that the bacteria comprising the microbiomes of Microcystis varied in community composition but were more stable at the level of functional contributions to their hosts across the phosphorus gradient. Finally, while our work was mostly focused on unraveling the genomic underpinnings of nutrient adaptation, we also observed consequences of these differences in Microcystis genome and microbiome composition at a physiological level. In particular, when nutrients were provided in abundance, Microcystis (and its microbiome) that had evolved to thrive in low-phosphorus environments could not grow as rapidly as strains from high-phosphorus environments.

Sara Jackrel, Postdoctoral Fellow, University of Michigan.

Read the full article here.

Citation: Jackrel, SL, White, JD, Evans, JT, et al. Genome evolution and host‐microbiome shifts correspond with intraspecific niche divergence within harmful algal bloom‐forming Microcystis aeruginosaMol Ecol. 2019; 28: 3994– 4011. https://doi.org/10.1111/mec.15198