Interview with the chief editor: Molecular Ecology Resources (MER)

In this special new-years post we interview the Chief Editor of MER Shawn Narum. Shawn, based at the Columbia River Inter-Tribal Fish Commission and the University of Idaho, has been chief editor for over 5 years. In this interview we get his perspective on the journal and the field in general as well as his advice for early career researchers.

See this link for a past interview with Shawn all the way back in 2014 with the Molecular Ecologist and this link for his 2020 editorial.

Image result for shawn narum

What are some of the main changes you have witnessed in the field of molecular ecology since you became Chief Editor of MER?

The advancement of molecular and statistical methods have driven the field of molecular ecology to new heights. Questions that were previously out of reach can now be addressed for most non-model species with careful study design.

What methods and resources do you think the field needs in the future?

Advances in sequencing methods have lead to fascinating discoveries of candidate genes associated with local adaptation and phenotypic variation many species, but development of candidate markers for intensive testing and validation is lacking. For example, bioinformatic resources are needed that efficiently and accurately develop primers/baits for specific subsets of markers that can be genotyped cost effectively in many individuals (e.g., Meek & Larson, 2019).

What are some of your favourite scientific discoveries from the past two decades?

Genomic islands of divergence are real! These islands often occur as inversions with low recombination that drive life history variation in organisms ranging from plants (Hoffmann & Rieseberg, 2008), birds (Lamichhaney et al., 2016), and fish (Jones et al., 2012)

As a fish geek, I also very much enjoyed the discovery that there is a warm blooded fish! It has long been known that some species like tuna and swordfish exhibit partial endothermy in brain tissue, but discovery of whole body endothermy in Opah living in cold, deep seas makes me smile (Wegner et al., 2015).

What advice would you give students wanting to develop a career in science?

Establish close collaborations with colleagues that you trust and nurture those relationships for the long-term.

What advice would you give to your younger-self about science and life?

Seize opportunities to work with others in a team environment, but it is OK to turn down some opportunities when there is already too much on your plate. “Too much” is when you can’t keep up with expectations that you have for yourself or projects substantially interfere with spending time with the people you love

What is your writing style like? Do you have some favourite writers that inspired you earlier on during your career?

My writing tends to be structured following a mental or written outline for clearly defined study questions. I have always been inspired by papers coming from Louis Bernatchez and have been grateful to have co-authored a few recent articles with him.

What are some of the aspects of your job as a scientist that you enjoy the most?

Two of the most rewarding aspects of my work are being involved with the development of young scientists and making new genomic discoveries that contribute towards conservation and recovery of naturally occurring species.

Outside of sequencing, what is your favourite methodological advance in the last five years?

Statistical advances that improve signal to noise in order to reduce false positives are critical to our field. One such approach called “Local score” was developed by Fariello et al (2017) to account for linked SNPs from high density genome scans to yield strong candidates (after Bonferroni correction). This is a powerful approach to detect adaptive genetic variation.

References

Meek, M. H., & Larson, W. A. (2019). The future is now: amplicon sequencing and sequence capture usher in the conservation genomics era. Molecular ecology resources. 19, 795–803.
https://doi.org/10.1111/1755-0998.12998

Hoffmann, A. A., & Rieseberg, L. H. (2008). Revisiting the impact of inversions in evolution: from population genetic markers to drivers of adaptive shifts and speciation?. Annual review of ecology, evolution, and systematics, 39, 21-42.
https://doi.org/10.1146/annurev.ecolsys.39.110707.173532

Lamichhaney, S., Fan, G., Widemo, F., Gunnarsson, U., Thalmann, D. S., Hoeppner, M. P., … & Chen, W. (2016). Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nature Genetics, 48(1), 84.
https://doi.org/10.1038/ng.3430

Jones, F. C., Grabherr, M. G., Chan, Y. F., Russell, P., Mauceli, E., Johnson, J., … & Birney, E. (2012). The genomic basis of adaptive evolution in threespine sticklebacks. Nature, 484(7392), 55.
https://doi.org/10.1038/nature10944

Wegner, N. C., Snodgrass, O. E., Dewar, H., & Hyde, J. R. (2015). Whole-body endothermy in a mesopelagic fish, the opah, Lampris guttatus. Science, 348(6236), 786-789.
https://doi.org/10.1126/science.aaa8902

Fariello, M. I., Boitard, S., Mercier, S., Robelin, D., Faraut, T., Arnould, C., … & Gourichon, D. (2017). Accounting for linkage disequilibrium in genome scans for selection without individual genotypes: the local score approach. Molecular ecology, 26(14), 3700-3714.
https://doi.org/10.1111/mec.14141

Interview with the author: Using host transcriptomics to sample blood parasites

Hosts offer diverse habitat for an incredibly rich array of microbial groups. Genomic resources for many groups residing within hosts (‘infra-communities’) are poor often due to the difficulty in isolating the DNA from the microbe from that of the host, particularly for species living within host cells. In this interview we go behind the scenes with Spencer Galen as he guides us through his transcriptomic approach he developed with colleagues to sample blood parasites such as malaria. Given how ubiquitous and important these parasites can be for animal health, this resource has the potential to pave the way for important advances in disease ecology. Read the paper here.

Avian blood transcriptomes revealed that hosts often have far more complex parasite communities than traditionally thought. For instance, the transcriptome of this Baltimore oriole (Icterus galbula) revealed at least six malaria parasite infections from three malaria parasite genera. The blood smear image from this bird shows the three genera in close contact within the host bloodstream. L: Leucocytozoon, PL: Plasmodium, PA: Parahaemoproteus.
Credit: Spencer Galen

What led to your interest in this topic / what was the motivation for this study? 

This study began with two classic ingredients of scientific discovery: a lot of frustration mixed with a bit of inspiration from other researchers. The frustration was born from a lack of available genetic resources for malaria parasites and other blood parasites, which I felt was hindering the kind of research that I wanted to do. The inspiration came during the first year of my PhD, when several papers were published within a span of just a few months showing that researchers were passively generating large quantities of blood parasite genomic data by sequencing the transcriptomes of their vertebrate hosts. My PhD advisor Susan Perkins and I thought that designing a study to explore this approach in more detail could solve some of my frustrations and help the field of blood parasite research at large.

What difficulties did you run into along the way? 

When we started this project there was always the looming possibility that we would sequence a number of host transcriptomes that were infected with blood parasites and simply not recover any useful parasite data. Even a small-scale transcriptomic project is not a trivial matter financially, and so I will admit that I lost some sleep wondering if this project was a bad idea. Fortunately, field and lab work went quite smoothly, and the results of my first scan for parasites within our initial test transcriptomes exceeded my wildest expectations. And so in reality the biggest challenge was my own self-doubt – if I had paid too much attention to those thoughts, this project might not have gotten off the ground.

What is the biggest or most surprising innovation highlighted in this study? 

We were astounded by just how prevalent blood parasite transcripts can be within host transcriptomes. For instance, in one bird (Vireo plumbeus sampled in the mountains of New Mexico) we found that nearly 17% of all contigs generated from the initial Trinity assembly were derived from a parasite that was infecting just 0.75% of all blood cells. A second surprising finding was the degree to which many of the birds that we sampled were infected with complex communities of parasites that we did not detect using traditional microscopic and DNA barcoding methods. Across all samples we found that transcriptomes revealed about ~20% more infections than the methods that are typically used to study these parasites. This included one individual bird that was infected by three different genera and at least six species of malaria parasite.

Moving forward, what are the next steps in this area of research?

While it is exciting to find that a transcriptomic approach can improve our ability to study the genomic diversity and abundance of wildlife blood parasites, it still remains a rather inefficient approach – at the end of the day, the majority of transcripts from each sample came from the host organism that was not the focus of our study. The next step will be to apply single-cell and other advanced RNA sequencing techniques that have successfully been applied to model systems to provide greater resolution to studies of blood parasite gene expression and host-parasite interactions.   

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

At risk of sounding overly pessimistic, be prepared for things to fail the first time around and have a plan B in place. It is wonderful to have a lot of confidence, but pessimism does tend to favor preparedness. Small actions within this frame of mind can save you a lot of grief in the long run, and can be as simple as testing a new method on a sample that isn’t important before you start your project or taking the time to visit a lab to learn a technique before you try it yourself. I naturally assume everything I try in the lab will fail, so each time things work (and they actually often do!) it is a pleasant surprise.

What have you learned about methods and resources development over the course of this project? 

I think that there is a difference between producing a resource, and producing a resource that is easily accessible to the broader research community in practice. As a result, I spent a lot of time thinking about how my colleagues would most directly benefit from the data that we had generated. In the end we made the data from this study available in as many formats as we thought might be useful to other researchers (raw sequences, assemblies from before and after parasite identification, curated alignments, DNA barcodes, etc.). The amount of time that it took to prepare these datasets was extremely small relative to the length of the entire project, and I think will go a long way towards making these data as useful as possible.

Describe the significance of this research for the general scientific community in one sentence.

This study improves our ability to research the ecology and evolution of wildlife blood parasites, a cosmopolitan and ubiquitous group that is widely relevant to global health.

Describe the significance of this research for your scientific community in one sentence.

The methodological framework that we present in this study profoundly improves the genomic resource base that is available to research understudied blood pathogens of wildlife, as well as better detect multi-species parasite communities within hosts.

Interview with the author: Creating the SPIKEPIPE metagenomic pipeline

Reliable abundance estimates is a significant challenge for eDNA metagenomic studies. One important issue is that sequencing introduces multiple sources of noise that can significantly alter the accuracy of abundance estimates. Here we interview Douglas Yu, a professor at the University of East Anglia, about the SPIKEPIPE pipeline recently published in Molecular Ecology Resources. This method is particularly exciting as it can use either short read barcodes or mitogenome data to estimate species abundances by accounting for sequencing noise using correction factors. They test this eDNA pipeline on arthropod samples taken from the High Arctic in Greenland and show that this approach can produce remarkably accurate species abundance estimates compared to samples of known composition. Read the full article here and get the code to run this pipeline here.

image
The 5 steps of SPIKEPIPE.

What led to your interest in this topic / what was the motivation for this study? 

We very much want to know how a heating climate is affecting biodiversity. Greenland is a direct window into this, both because heating has progressed very fast here, and because local species richness is manageable for study:  375 known aboveground arthropod species at the Zackenberg research station. Equally important, the Danish research station at Zackenberg had had the foresight to systematically collect arthropods starting in 1996, and those samples were sitting in ethanol in a warehouse in Denmark. The main obstacle to using them had been that no one could identify the hundreds of thousands of individuals to species level. Luckily, Helena Wirta and Tomas Roslin had in parallel carried out a DNA barcoding campaign at Zackenberg. Put together, we had in our hands a complete time series of community dynamics over a stretch of time during which summer had almost doubled in length. 

What difficulties did you run into along the way? 

When we started, we were all set to use metabarcoding. However, we soon learned (not surprisingly) that the sample-handling protocols had not been designed with molecular methods in mind:  the trap water was reused across time periods, the collecting net was used across traps, and the sorting trays were not bleached between samples. We thus needed a protocol that would be robust to cross-sample contamination and would ideally return quantitative information, since we wanted to detect change in population dynamics. This is why we turned to mitochondrial metagenomics (Tang et al. 2015, Crampton-Platt et al. 2016) and came up with SPIKEPIPE, which combines read-mapping, a percent-coverage detection threshold, and a spike-in to correct for pipeline stochasticity. 

What is the biggest or most surprising innovation highlighted in this study? 

The individual elements of SPIKEPIPE were reasonably well known, but what we hadn’t anticipated is just how accurate the results were when combined in a single pipeline. With mock samples, we found no false-positive species detections (when the percent-coverage threshold is applied) and recovered highly accurate estimates of intraspecific abundances (in terms of DNA mass). With resequenced environmental samples, we found high repeatability of abundance estimates across sample repeats, even though DNA extraction and Illumina library prep, sequencing, and base-calling all inject stochasticity into datafile sizes.

Also very gratifying was finding that SPIKEPIPE returned useful data even when mapping reads only to short DNA barcodes, as originally presaged by Xin et al. (2013). This means that we can make use of the existing vast DNA-barcode reference library.

Moving forward, what are the next steps in this area of research?

SPIKEPIPE is of course only the means to an end, and our next goal is the statistical analysis of community change in a rapidly heating ecosystem. Nerea Abrego and Otso Ovaskainen are now applying joint species distribution modelling (with the R package Hmsc, Tikhonov et al. 2019) to the dataset of 712 pitfall-trap samples. One important question is to quantify how much of the year-to-year variation in species abundances can be attributed to species interactions, as opposed to climate variables. 

More broadly, the result that SPIKEPIPE can be used with DNA barcodes makes possible an intriguing strategy:   one may now generate both the species reference database and the sample-by-species table from the same set of samples. We are using Greenfield et al.’s (2019) Kelpie software to carry out targeted assembly of DNA barcodes from shotgun-sequenced bulk samples, which we compile into a single DNA-barcode reference database, against which we then map reads from each sample to generate the data table. 

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Build in a lot of testing:  multiple, complex mock samples for pipeline development, repeat environmental samples to measure repeatability, realistically complex positive controls, many negative controls, and many sanity checks as you work through your bioinformatic code. 

You are likely to be learning to code at the same time that you write your first pipelines. Take the extra time *now* to learn and apply robust coding techniques, even if there are easier but less robust methods available. 

Read Jenny Bryan’s tutorial on file naming:  https://speakerdeck.com/jennybc/how-to-name-files

What have you learned about methods and resources development over the course of this project? 

A great way to inspire new methods is to talk with non-molecular researchers about their scientific questions, currently used methods, and available sample types. Our team includes arctic ecologists, molecular ecologists, and a mathematician.

For one’s method to have impact, it will need to be useful for years after one first thinks of it. Stay up to date with technology trends, including costs, to avoid rapid obsolescence.

Describe the significance of this research for the general scientific community in one sentence.

We can use DNA sequencing to quantify how insect and spider communities respond to environmental change.

Describe the significance of this research for your scientific community in one sentence.

Mitochondrial metagenomics is a viable alternative to amplicon sequencing for characterising arthropod communities. 

CRISPR-Cas Diagnostics for Environmental Monitoring

In a special blog post, Molly-Ann Williams(@WilliamsMolly_9) and Anne Parle-McDermott (@anne_parle) from the School of Biotechnology and DCU Water Institute, Dublin City University provide an overview of how CRISPR-Cas works and how it can be applied to ecology and monitoring in particular. Read their recently published Molecular Ecology Resources paper here.

The field of CRISPR-Cas for genome editing has simply exploded since its introduction in 2012. The discovery of many different Cas enzymes with additional natural or genetically engineered functionalities, is resulting in an increase in CRISPR-Cas applications across all fields from food security to medicine. 

Number of Scopus search results for query “CRISPR” in given year. Search performed on 21 November 2019 .

So how can we join the revolution and apply CRISPR-Cas to the field of Ecology?

CRISPR-Cas systems consist of two main elements: a guide and a nuclease. Guides (made of RNA) direct the nuclease (Cas enzyme) to specific nucleic acid sequences (DNA or RNA). Upon target recognition the nuclease carries out the desired response, most commonly cleavage of the target sequence. The initially discovered CRISPR-Cas system relied on a nuclease called Cas9. This enzyme is involved in highly specific cleavage of target sequences that allow genome editing to occur by activating the natural repair system of the cell. More recently the applications of this system have been expanded beyond genome editing by the discovery of several new Cas enzymes with a secondary function i.e., the indiscriminate cleavage of single stranded nucleic acids upon target recognition. The discovery of these Cas enzymes has revolutionised nucleic acid diagnostics due to two main features:

Two main elements of a CRISPR-Cas diagnostic system: Cas enzyme and guide RNA effector complex and single stranded (ss) nucleic acid reporter molecule. In this example, the nuclease is Cas12a specific to DNA detection downstream from a TTTV PAM site. Adapted from Williams MA et al (2019).
  1. Protein-guide and cleavage molecules (Cas): able to specifically recognise target nucleic acids, cleave the target sequence and subsequently cleave other non-specific nucleic acids.
  2. Nucleic acids as reporters: the non-specific nucleic acids can be designed as a reporter molecule that releases measurable signal when cleaved. This allows us to visualise when the initial target sequence has been detected and apply it to diagnostics and species monitoring.

Two main elements of a CRISPR-Cas diagnostic system: Cas enzyme and guide RNA effector complex and single stranded (ss) nucleic acid reporter molecule. In this example, the nuclease is Cas12a specific to DNA detection downstream from a TTTV PAM site.

The three main Cas enzymes of interest for diagnostics are Cas12, Cas13 and Cas14 each with unique functions applicable to different types of tests (for a more detailed discussion of these enzymes visit this blog).

The Cas enzyme most relevant for single species detection from environmental DNA is the enzyme Cas12a. This nuclease can detect both ssDNA and dsDNA but can only recognise DNA sequences downstream from a TTTV protospacer adjacent motif (PAM). Importantly, Cas12a cannot detect DNA sequences missing this PAM site. This is vital when designing single species detection assays.

Do you have two closely related species that you want to distinguish? Searching your target species sequence for a site downstream of a PAM site found ONLY in your target, and not in sympatric species, will ensure highly specific recognition and prevent detection of non-target species.

What if you work with environmental RNA? Well there is a CRISPR-Cas system for you too! The Cas enzyme Cas13 differs from Cas12a in that it recognises single stranded RNA molecules with non-specific cleavage of ssRNA following target cleavage i.e., it works the same as Cas12a but targets RNA rather than DNA.

The world of CRISPR diagnostics is still in its early stages but with the discovery of new CRISPR-Cas systems with unique functions, there is no reason ecologists cannot utilise these diagnostic tools to enhance environmental monitoring using molecular techniques. For more information on using CRISPR-Cas diagnostics for single species detection from environmental DNA read our paper here.

Methods summary: Addressing (one of) the challenges of RADseq

Article by Evan McCartney-Melstad and Brad Shaffer from University of California at Los Angeles

RADseq is a great method for gathering genomic data to answer biological questions across many different scales, from phylogenetics to population and landscape genetics. It is fast, inexpensive, and requires no previous knowledge about the species’ genomic architecture. However, with this flexibility comes challenges. In this paper we develop and bench test an approach to address what may be the biggest RADseq challenge: how to choose the right sequence similarity threshold that defines whether two non-identical sequencing reads arose from the same or different genomic locations. This problem goes to the heart of evolutionary genetics— if two sequences are considered to be homologous, or derived from the same ancestral genomic location with subsequent modification through time, then they tell us a great deal about evolutionary history. If they are paralogous, and map to separate locations, then they lack that shared evolutionary history. Getting this straight is perhaps the single most important step in using genomic data for evolutionary inference.

Heat maps showing pairwise data missingness at clustering thresholds of 88% (a) and 99% (b). 

Studies that include relatively distantly related samples, such as those asking phylogenetic or biogeographical questions, should expect that homologous sequences will have diverged over time and therefore require lower similarity thresholds that allow for that divergence. However, if the threshold is set too low, paralogs will be falsely assigned to the same genomic locus, leading to problems ranging from inflated missing data rates to inaccurate measures of genetic diversity. Rather than relying on rough guesses that are preset in software packages, our approach attempts to balance these two competing forces by quantifying the relationship between pairwise genetic relatedness (as estimated directly from the data) and summaries of the RADseq dataset including pairwise data missingness and the slope of isolation by distance among samples. The relationship between pairwise genetic distance and pairwise data missingness is particularly informative—although some positive correlation is expected as mutations accumulate in enzyme restriction sites that RAD relies on, there is often a clear pattern of increased pairwise missingness that occurs when the most divergent homologous allelic variants begin to be erroneously oversplit into different presumptive loci. By explicitly looking for this breakpoint as a function of clustering threshold, researchers can choose a value that allows them to maximize the number of genomic regions recovered while minimizing the erroneous oversplitting of highly divergent, but homologous loci.

Citation: McCartney‐Melstad, E, Gidiş, M, Shaffer, HB. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour. 2019; 19: 1195– 1204. https://doi.org/10.1111/1755-0998.13029

Methods summary: Applying CRISPR to detect eDNA

Article by Molly-Ann Williams and Anne Parle-McDermott both from Dublin City University

We were challenged to design and build a simple and rapid species monitoring system. Why do we need such a system?  Biodiversity loss is at an all-time high and such a system would help to support the management and conservation of fish species within aquatic environments by acquiring knowledge of species distribution that traditionally is gained through visual detection and counting. These methods are expensive, time consuming and can lead to harm of the species of interest.    We decided that environmental DNA (eDNA) was the way to go but we had to solve the ‘PCR problem’ i.e., avoid having to do cyclical high temperatures as that would see us ending up with a costly, once-off device that would likely not be applied outside our lab.  This got us brainstorming and led us to a novel isothermal detection method, combining Recombinase Polymerase Amplification with CRISPR-Cas detection, which simplifies the adaptation of nucleic acid detection on to a biosensor device.

This innovative methodology utilises the collateral cleavage activity of Cas12a, a ribonuclease guided by a highly specific single CRISPR RNA, to detect specific species from eDNA. We proved it could work for eDNA by applying the technology to the detection of Salmo salar from eDNA samples collected in Irish rivers, where presence or absence had been previously confirmed using conventional field sampling. The beauty of this advance is that it can be applied to any species in the environment.  Not only does this assay solve the ‘PCR problem’, it is also is a better approach for distinguishing very closely related species.  We look forward to others in the field adapting it to their own favourite species of interest.  

Citation: Williams, M‐A, O’Grady, J, Ball, B, et al. The application of CRISPR‐Cas for single species identification from environmental DNA. Mol Ecol Resour. 2019; 19: 1106– 1114. https://doi.org/10.1111/1755-0998.13045