CRISPR-Cas Diagnostics for Environmental Monitoring

In a special blog post, Molly-Ann Williams(@WilliamsMolly_9) and Anne Parle-McDermott (@anne_parle) from the School of Biotechnology and DCU Water Institute, Dublin City University provide an overview of how CRISPR-Cas works and how it can be applied to ecology and monitoring in particular. Read their recently published Molecular Ecology Resources paper here.

The field of CRISPR-Cas for genome editing has simply exploded since its introduction in 2012. The discovery of many different Cas enzymes with additional natural or genetically engineered functionalities, is resulting in an increase in CRISPR-Cas applications across all fields from food security to medicine. 

Number of Scopus search results for query “CRISPR” in given year. Search performed on 21 November 2019 .

So how can we join the revolution and apply CRISPR-Cas to the field of Ecology?

CRISPR-Cas systems consist of two main elements: a guide and a nuclease. Guides (made of RNA) direct the nuclease (Cas enzyme) to specific nucleic acid sequences (DNA or RNA). Upon target recognition the nuclease carries out the desired response, most commonly cleavage of the target sequence. The initially discovered CRISPR-Cas system relied on a nuclease called Cas9. This enzyme is involved in highly specific cleavage of target sequences that allow genome editing to occur by activating the natural repair system of the cell. More recently the applications of this system have been expanded beyond genome editing by the discovery of several new Cas enzymes with a secondary function i.e., the indiscriminate cleavage of single stranded nucleic acids upon target recognition. The discovery of these Cas enzymes has revolutionised nucleic acid diagnostics due to two main features:

Two main elements of a CRISPR-Cas diagnostic system: Cas enzyme and guide RNA effector complex and single stranded (ss) nucleic acid reporter molecule. In this example, the nuclease is Cas12a specific to DNA detection downstream from a TTTV PAM site. Adapted from Williams MA et al (2019).
  1. Protein-guide and cleavage molecules (Cas): able to specifically recognise target nucleic acids, cleave the target sequence and subsequently cleave other non-specific nucleic acids.
  2. Nucleic acids as reporters: the non-specific nucleic acids can be designed as a reporter molecule that releases measurable signal when cleaved. This allows us to visualise when the initial target sequence has been detected and apply it to diagnostics and species monitoring.

Two main elements of a CRISPR-Cas diagnostic system: Cas enzyme and guide RNA effector complex and single stranded (ss) nucleic acid reporter molecule. In this example, the nuclease is Cas12a specific to DNA detection downstream from a TTTV PAM site.

The three main Cas enzymes of interest for diagnostics are Cas12, Cas13 and Cas14 each with unique functions applicable to different types of tests (for a more detailed discussion of these enzymes visit this blog).

The Cas enzyme most relevant for single species detection from environmental DNA is the enzyme Cas12a. This nuclease can detect both ssDNA and dsDNA but can only recognise DNA sequences downstream from a TTTV protospacer adjacent motif (PAM). Importantly, Cas12a cannot detect DNA sequences missing this PAM site. This is vital when designing single species detection assays.

Do you have two closely related species that you want to distinguish? Searching your target species sequence for a site downstream of a PAM site found ONLY in your target, and not in sympatric species, will ensure highly specific recognition and prevent detection of non-target species.

What if you work with environmental RNA? Well there is a CRISPR-Cas system for you too! The Cas enzyme Cas13 differs from Cas12a in that it recognises single stranded RNA molecules with non-specific cleavage of ssRNA following target cleavage i.e., it works the same as Cas12a but targets RNA rather than DNA.

The world of CRISPR diagnostics is still in its early stages but with the discovery of new CRISPR-Cas systems with unique functions, there is no reason ecologists cannot utilise these diagnostic tools to enhance environmental monitoring using molecular techniques. For more information on using CRISPR-Cas diagnostics for single species detection from environmental DNA read our paper here.

Methods summary: Addressing (one of) the challenges of RADseq

Article by Evan McCartney-Melstad and Brad Shaffer from University of California at Los Angeles

RADseq is a great method for gathering genomic data to answer biological questions across many different scales, from phylogenetics to population and landscape genetics. It is fast, inexpensive, and requires no previous knowledge about the species’ genomic architecture. However, with this flexibility comes challenges. In this paper we develop and bench test an approach to address what may be the biggest RADseq challenge: how to choose the right sequence similarity threshold that defines whether two non-identical sequencing reads arose from the same or different genomic locations. This problem goes to the heart of evolutionary genetics— if two sequences are considered to be homologous, or derived from the same ancestral genomic location with subsequent modification through time, then they tell us a great deal about evolutionary history. If they are paralogous, and map to separate locations, then they lack that shared evolutionary history. Getting this straight is perhaps the single most important step in using genomic data for evolutionary inference.

Heat maps showing pairwise data missingness at clustering thresholds of 88% (a) and 99% (b). 

Studies that include relatively distantly related samples, such as those asking phylogenetic or biogeographical questions, should expect that homologous sequences will have diverged over time and therefore require lower similarity thresholds that allow for that divergence. However, if the threshold is set too low, paralogs will be falsely assigned to the same genomic locus, leading to problems ranging from inflated missing data rates to inaccurate measures of genetic diversity. Rather than relying on rough guesses that are preset in software packages, our approach attempts to balance these two competing forces by quantifying the relationship between pairwise genetic relatedness (as estimated directly from the data) and summaries of the RADseq dataset including pairwise data missingness and the slope of isolation by distance among samples. The relationship between pairwise genetic distance and pairwise data missingness is particularly informative—although some positive correlation is expected as mutations accumulate in enzyme restriction sites that RAD relies on, there is often a clear pattern of increased pairwise missingness that occurs when the most divergent homologous allelic variants begin to be erroneously oversplit into different presumptive loci. By explicitly looking for this breakpoint as a function of clustering threshold, researchers can choose a value that allows them to maximize the number of genomic regions recovered while minimizing the erroneous oversplitting of highly divergent, but homologous loci.

Citation: McCartney‐Melstad, E, Gidiş, M, Shaffer, HB. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour. 2019; 19: 1195– 1204. https://doi.org/10.1111/1755-0998.13029

As genomic and ecological data sets grow larger in size, researchers are flooded with far more information than was available when many conventional model-based approaches were designed. To deal with these massive amounts of data, many researchers have turned to machine learning techniques, which promise the ability to help find signals within the noise of the complex data sets generated by modern sequencing approaches. Applications for machine learning in molecular ecology are broad and include global studies of biodiversity patterns, species delimitation studies, and studies of the genomic architecture of adaptation, among many others. Here at Molecular Ecology Resources, we are excited to highlight research that applies supervised and unsupervised machine learning algorithms to answer questions of interest to the readership of molecular ecology. This special issue will also highlight the nuances and limitations of machine-learning techniques. Rather than focusing on the supposed differences between machine-learning and model-based approaches, this issue would aim to highlight the broad spectrum of machine-learning approaches, many of which can incorporate model-based expectations and predictions.

We are soliciting original research that applies novel robust applications of machine learning methods on molecular data to address questions across ecological disciplines.

Details

Manuscripts should be submitted in the usual way through the Molecular Ecology Resources website. Submissions should clearly state in the cover letter accompanying the submission that you wish the manuscript to be considered for publication as part of this special issue. Pre-submission inquiries are not necessary, but any questions can be directed to: manager.molecol@wiley.com

Special issue editors: Nick Fountain-Jones, Megan Smith & Frédéric Austerlitz

Intra-specific variation and the algal microbiome

Individuals within a species vary, and this variation can have important implications for the role a species may play within ecosystems. We compared the relative importance of variation within species due to genetic changes within its own genome versus symbiotic interactions between the focal species and its associated bacteria, also called their microbiome. We focused on Microcystis aeruginosa, a globally distributed photosynthetic cyanobacterium, also known as blue-green algae, that often dominates freshwater harmful algal blooms.

Colony of Microcystis aeruginosa from Gull Lake. Colony photographed by O. Sarnelle of Michigan State University and image prepared by John Megahan of University of Michigan.

These blooms have recently become more common and intense worldwide, causing major economic and ecological damages. We studied Microcystis and their associated microbiomes from lakes in Michigan, USA that vary in phosphorus content, which is the primary limiting nutrient in lakes. We found genomic changes among strains of Microcystis along this phosphorus gradient that indicated increased efficiency in the use of phosphorus and nitrogen. Intriguingly, we found that genotypes adapted to different nutrient environments co-occurred in phosphorus‐rich lakes. This co-occurrence may have critical implications for understanding how Microcystis blooms persist for many months, long after nutrients become depleted within lakes. Similar to previous findings in for example the human microbiome, we uncovered that the bacteria comprising the microbiomes of Microcystis varied in community composition but were more stable at the level of functional contributions to their hosts across the phosphorus gradient. Finally, while our work was mostly focused on unraveling the genomic underpinnings of nutrient adaptation, we also observed consequences of these differences in Microcystis genome and microbiome composition at a physiological level. In particular, when nutrients were provided in abundance, Microcystis (and its microbiome) that had evolved to thrive in low-phosphorus environments could not grow as rapidly as strains from high-phosphorus environments.

Sara Jackrel, Postdoctoral Fellow, University of Michigan.

Read the full article here.

Citation: Jackrel, SL, White, JD, Evans, JT, et al. Genome evolution and host‐microbiome shifts correspond with intraspecific niche divergence within harmful algal bloom‐forming Microcystis aeruginosaMol Ecol. 2019; 28: 3994– 4011. https://doi.org/10.1111/mec.15198

Interview with the Author: Conservation of old individual trees and small populations is integral to maintain species’ genetic diversity of a historically fragmented woody perennial

What is the unit of conservation? Is it similar for different types of plants? How does the reproductive biology of the organism can inform the best practices in conserving threatened species? In her Doctoral research, Nicole Bezemer is studying Eucaliptus species from South Western Australia to better understand population dynamics in long-lived organisms and how this can lead to better management of their populations. Surprisingly, many of the small and fragmented populations of the two subspecies of E. caesia she studied are genetically differentiated at a fine spatial scale, and high levels of heterozygosity persists even in populations with a dozen of individuals. Nicole and colleagues suggest the clonal and perennial nature of E. caesia might contribute to these unusual patterns of genetic diversity and divergence, and suggest that traditional conservation genetic approaches might be detrimental for naturally fragmented species with these life-history characteristics. Read here about her experience in developing this research.

A multi-stemmed genet of Eucalyptus caesia at Mocardy Hill, Western Australia. Photo by NB.

What led to your interest in this topic / what was the motivation for this study? 
Eucalyptus caesia is an intriguing study species, given the combination of a distribution on scattered granite outcrops, a long history of geographic and genetic insularity, a capacity for individual longevity via lignotuber re-sprouting, a lack of recent recruitment in most known stands, and adaptation for pollination by nectarivorous birds. After completing my Honours research at the Boyagin stand of E. caesia, I was hooked. The present study came into fruition upon discovering that one of my PhD experiments, involving 6 months of controlled cross-pollinations, was killed by a series of frosts. I had already genotyped two large stands of E. caesia and I was curious about what patterns of genetic structure might exist in other stands, and across the species’ landscape distribution. 

What difficulties did you run into along the way? 
Some stands of E. caesia are located on immense granite outcrops, often hidden in hard-to-access gullies or behind thick barricades of vegetation. The first challenging aspect of the project was to find the sub-populations of E. caesia at each new location. For many populations, I did so by embarking on a Google Earth tour led by my supervisor, Steve Hopper, who has worked on the granite outcrop flora of south-west Australia and on E. caesia for nearly four decades. Nonetheless, I spent many hours traversing granite outcrops, sometimes in circles, which occasionally led to finding additional plants or, in the case of the E. caesia at Old Muntadgin, a previously undocumented population of several hundred plants.

What is the biggest or most surprising innovation highlighted in this study? 
I was surprised by the apparent lack of genetic interconnection between some stands over relatively small spatial scales. Given the long history of population fragmentation and reproductive biology of E. caesia (multiple modes of reproduction and gravity-dispersed seed), I anticipated that high levels of genetic differentiation would feature. Regardless, it was surprising to find that, in some instances, the level of genetic differentiation within stands exceeded that among stands. Another interesting result revealed by comprehensive genotyping were some very small census population sizes. Seven stands were comprised of fewer than ten unique multi-locus genotypes, and three locations had only one or two genotypes. Localised clonal reproduction is clearly of paramount importance to the persistence of these stands.

Moving forward, what are the next steps in this area of research?
The next step is to further test the genetic integrity of the two subspecies, E. caesia subsp. caesia and E. caesia subsp. magna, by genotyping plants from additional stands. Walyamoning and Yanneymooning are geographical outliers to other stands of subsp. caesia and occur within relatively close proximity to the group of subsp. magna populations located in the north-east of the species distribution. We propose to genotype a sample of individuals from the two outlier populations of subsp. caesia stands, and at three additional locations of subsp. magna, to test whether the two subspecies are genetically distinct even when populations are sympatric, and to determine if hybridisation has occurred.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 
My message to other young or early-career researchers is to have a clear research outcome in mind before exploring the application of novel techniques. Avoid putting yourself in the position of having to come up with a hypothesis after the fact.

What have you learned about methods and resources development over the course of this project? 
Comprehensive genotyping at multiple spatial scales may provide a more complete picture of spatial genetic structure compared to studies where sampling efforts are focused on few individuals from many populations, or on many individuals from few populations. There is still much to be gained from population genetic studies, especially in understudied, biodiverse, endemism hotspots such as granite outcrops, and in understudied systems such as small, historically fragmented populations of long-lived trees.

Describe the significance of this research for the general scientific community in one sentence.
Anciently fragmented plant populations may be adept at persisting as small populations with low genetic diversity and limited genetic interconnection, and therefore attempts to connect such populations may be ineffective or even harmful.

Describe the significance of this research for your scientific community in one sentence.
Small populations of long-lived woody perennial plants, even those comprising a handful of individuals, may contain unique genotypes that contribute to overall species genetic diversity, and are worthy of conservation.

Enjoying the afternoon light from my field base camp underneath Eucalyptus caesia at Boyagin Rock. Photo by NB.

Interview with the author: A guide to the application of Hill numbers to DNA based diversity analyses

image
Diversity assessment procedures in traditional and DNA sequencing‐based approaches. Recorded entities need to be classified into types, before each type is weighed according to its relative abundance and the order of diversity (q). Note the example refers to an abundance‐based, rather than incidence‐based, approach

What are Hill Numbers? What do they have to do with estimating biodiversity? How can you use them as a Molecular Ecologist? Read the recent review in Molecular Ecology Resources by Antton Alberti and Thomas Gilbert on this topic, and read the interview with Antton below to learn how they think about Hill numbers and their applications to metabarcoding. Also, check hilldiv, “an R package to assist analysis of diversity for diet reconstruction, microbial community profiling or more general ecosystem characterisation analyses based on Hill numbers, using OTU tables and associated phylogenetic trees as inputs. The package includes functions for (phylo)diversity measurement, (phylo)diversity profile plotting, (phylo)diversity comparison between samples and groups, (phylo)diversity partitioning and (dis)similarity measurement. All of these grounded in abundance-based and incidence-based Hill numbers.”

What led to your interest in this topic / what was the motivation for this study? 
Measuring, estimating and contrasting biological diversity are central operations in most ecological studies. In the last decades, dozens of diversity indices and metrics have been proposed, each with their individual strengths and weaknesses, and specific mathematical assumptions. The measures that many of them yield are difficult to interpret, because the values might refer to abstract units, which lack an straightforward interpretation for non-specialists. We believe that the statistical framework developed around the Hill numbers overcomes many of these problems, and provides a statistical toolset that is extremely useful for ecologists. Besides, Hill numbers enable incorporating complementary information, such as phylogenetic dissimilarities across organisms, which are really handy for molecular ecologists who can easily build phylogenetic trees from metabarcoding data.

What difficulties did you run into along the way? 
We are a molecular ecologist and an evolutionary biologist that use many different mathematical tools, but are not expert mathematicians. Hence, of the main challenges was to make sure that all the statements and mathematical interpretations were correct!

What is the biggest or most surprising innovation highlighted in this study?
The aim of our review was to demonstrate to ecologists, who like us might have a limited mathematical background, that implementing the framework developed around the Hill numbers is not difficult, and has big potential gains. In our review we gathered information and tools generated by others, mainly Lou Jost, Anne Chao and Chun-Huo Chiu, and displayed them in a comprehensive way for molecular ecologists. We have tried to explain complex mathematical formulations in layman terms, exactly as we would like others to explain us other contents we are not familiar with. We have provided examples and pieces of code, that we hope will encourage other researches to use these tools.

Moving forward, what are the next steps in this area of research?
Our article mainly focuses on diversity measurement from data generated using DNA metabarcoding. While bioinformatic methods to generate metabarcoding data have received much attention in the last decade, the impact of the statistical approaches used to analyse diversity has been less studied. Assessing their impact and providing guidelines for selecting the tool best suited to address specific questions with specific types of data, will be an important next step in the area of metabarcoding-based diversity analyses.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 
Despite the fact that they might at first seem complex and abstract, bioinformatic and statistical tools are necessary to address ecological questions. Hence, we would encourage students to try to understand the basic bioinformatic and statistical procedures, so as to be able to select the best tools to address their research questions.

image
Differences between abundance‐based and incidence‐based Hill numbers. The Hill numbers yielded for the entire system are different depending on the approach employed. In abundance‐based approaches, the DNA sequence is the unit that the diversity is computed on, while in incidence‐based approaches, it is the sample the unit upon which the diversity is measured. (*) The asterisk indicates that the equations are undefined for q = 1, thus in practice either the 1D formula shown in Table 1 or a limit of the unity must be used, for example, q = 0.9999. However, q = 1 is used for the sake of simplicity

What have you learned about methods and resources development over the course of this project?
That its not the most broadly-employed tools that are always the best way to address scientific questions!

Describe the significance of this research for your scientific community in one sentence.
Hill numbers provide powerful, solid and versatile tools with which to carry out most of the analyses that are needed to assess biological diversity within a common statistical framework.