Interview with the authors: glacial refugia and the dispersal of terrestrial invertebrates

Antarctica is an extreme and isolated environment that supports a variety of species. However, we know little about how terrestrial species survive in these kinds of conditions. In a recent paper in Molecular Ecology, McGaughran and colleagues investigated a widespread group of terrestrial invertebrates to understand how species have persisted in this harsh environment. These researchers found that there were many local clusters of individuals with substantially more long-distance dispersal events than were previously identified. These long-distance dispersers were likely aided by wind, providing an interesting example of the link between environmental conditions and population stability. For more information, please see the full article and the interview with McGaughran, lead author of the study, below. 

Antarctic Peninsula taken near the tip. Photo created by Dr. Ceridwen Fraser.

What led to your interest in this topic / what was the motivation for this study? 
During my PhD, I researched genetic and physiological diversity of Antarctic terrestrial invertebrates, spending a collective ~6 months on the ice.  I then stepped away from Antarctic research for several years, completing postdocs in Germany and Australia, but I never forgot my time in Antarctica or my love for its unique environment.  Thus, I’ve maintained collaborative links that have allowed me to continue to contribute to Antarctic research.  In this study, we wanted to see whether genomic data would give us greater insight to the evolutionary history of invertebrates along the Antarctic Peninsula than had been gained with single-gene analysis in the past.  

What difficulties did you run into along the way? 
Getting workable quantities of DNA from tiny (~1 mm) springtails to use in genomic applications is difficult.  In fact, for this study, we tried to extract DNA from several Antarctic springtail species, but were only successful in our attempts with Cryptopygus antarcticus antarcticus.  Low DNA concentrations can also mean that the genomic data we end up with for analysis is patchy.  These aspects provide some challenges, but the methodologies underlying library preparation and sequencing are continually improving and we are excited about the potential of applying genomic methodologies to more Antarctic taxa in the future.

What is the biggest or most surprising finding from this study? 
Using genome-wide data, we were able to find evidence for a greater frequency of dispersal events than had been previously shown with single-gene data.  This was particularly surprising because dispersal for Antarctic invertebrates is hard.  These animals live under the rocks in moist ice-free areas.  As soon as they leave the relative safety of the soil column, they are exposed to freezing and desiccating conditions.  Thus, though we have some evidence to suggest that springtails can survive for short periods in humid air columns or floating on water, our expectation is that such events would be rare.  Finding genetic evidence that suggested several instances of successful dispersal over extremely long geographic distances was therefore surprising.

Moving forward, what are the next steps for this research? 
Much of the Antarctic literature focused toward understanding evolutionary and biogeographic questions has been based on single-gene analyses because genomic approaches are still relatively new.  This previous work has been informative about the fact that many Antarctic terrestrial species have survived glaciation in refugia, but there is much that remains to be discovered.  Antarctica is a kind of barometer for the rest of the world and it is important that we understand how species there have responded to environmental change in the past and how they may do so in the future.  Thus, key to extending this research will be to bring genomic approaches to bear on other populations and species in Antarctica.  This will help us to gain an understanding of how isolated Antarctica really is, and how its endemic species will likely respond to future environmental changes.

What would your message be for students about to start their first research projects in this topic? 
In this genomic and associated bioinformatic era, learning the skills of a well-rounded biologist who has a breadth of understanding that spans the field, the laboratory, and the computer, can be daunting.  As you develop or use novel techniques in Molecular Ecology, my message would be to stick with it through the hard stuff.  It is such an exciting time to be an evolutionary biologist and, though it can involve some really tough moments, the revelations we can achieve about how the world works are key.  Alongside this, I would suggest that collaboration is now more important than ever – don’t feel like you have to reinvent the wheel or be an expert on every single aspect of your research.  Instead, develop your own niche and share in the expertise of those around you to do the best science together.

What have you learned about science over the course of this project? 
When I first started doing research, there was no such thing as genomics or next generation sequencing and we simply didn’t have the means to gain genome-wide data.  In recent years, the face of evolutionary biology has changed due to the revolution in sequencing technology and bioinformatics.  As exemplified by this project, I’ve learned that genomic data can provide new and more nuanced insights into our biological questions of interest.  And, though it can be hard at times to work in such a swift-moving area of research, it is ultimately very rewarding.

Describe the significance of this research for the general scientific community in one sentence.
The environment, especially wind, plays an important role in structuring patterns of genetic diversity among Antarctic populations – thus future climatic changes are likely to have a significant impact on the distribution and diversity of these populations.  

Describe the significance of this research for your scientific community in one sentence.
Bringing genomic data to bear on long-standing evolutionary questions in Antarctica is a worthwhile and fruitful endeavour that will ultimately produce greater insights into understanding and protecting Antarctic taxa.

Dry Valleys taken in the Antarctic Dry Valleys. Photo created by Dr. Angela McGaughran.

McGaughran A, Terauds A, Convey P, Fraser CI. 2019. Genome‐wide SNP data reveal improved evidence for Antarctic glacial refugia and dispersal of terrestrial invertebrates. Molecular Ecology. 28:4941-4957. https://doi.org/10.1111/mec.15269.

Interview with the authors: Parent and offspring genotypes influence gene expression in early life

Early life stress can often have long-term fitness effects on organisms, and the molecular mechanisms behind this have long been of interest to biologists. While much work has demonstrated that changes in DNA methylation patterns are involved, the transcriptional effects of early life stress are less well-understood, particularly at a genome-wide level. In a recent Molecular Ecology paper, Daniel J. Newhouse and colleagues investigate the transcriptional effects of different parental care strategies in white-throated sparrows. In white-throated sparrows, there are two morphs, and two associated mating pair types: tan male x white female (TxW) pairs, and white female x tan male (WxT) pairs. While TxW pairs provide biparental care, WxT pairs provide female-biased parental care. Newhouse and colleagues use RNA sequencing to assess the transcriptional effects of these differences in parental care strategies. They find evidence of an elevated stress response in offspring of WxT pairs, which provide female-biased parental care. For more information, read the full article, and see the in-depth interview with the authors below.

 A white morph female white-throated sparrow feeding her nestlings. Photo credit: Tiffany Deater.

What led to your interest in this topic / what was the motivation for this study? 
Early in graduate school, I participated in the white-throated sparrow genome sequencing project. That project was my crash course in white-throated sparrow biology, and the unique genetics and associated behaviors of the sparrows fascinated me. Most work on white-throated sparrows focuses on the adults, but nestlings are relatively understudied.  Depending on the adult pair type of the nest, nestlings will either receive biparental care (parents=tan morph male & white morph female) or female-biased parental care (parents=white morph male & tan morph female). Essentially, I wanted to see how this parental care variation impacted the nestlings.

What difficulties did you run into along the way? 
Finding white-throated sparrow nests was much harder than I ever imagined. Hiking through bogs while fighting off swarms of biting insects made it even more difficult. Thankfully, I have wonderful collaborators who are amazing at finding nests.
Also, when we designed this study, there weren’t many examples of RNA-seq from bird blood. White-throated sparrow nestlings are very small, so the amount of blood we can collect is quite small. RNA extractions proved more difficult than expected, but we managed to sequence a sufficient amount for the study.

What is the biggest or most surprising finding from this study? 
It was surprising to see a solid signature of morph-specific gene expression. As adults, there are many differences in the transcriptome between morphs and these correlate strongly with their behavior. White-morph and tan-morph nestlings look the same and do not exhibit any morph-specific behaviors like we see in adults. Despite this, we found that a large number of genes found within the chromosomal inversion are differentially regulated. Some of these genes have also been previously identified in the brain of adult white-throated sparrows. It was cool to see the same genes appear very early in life and in a much different tissue (blood).

Moving forward, what are the next steps for this research? 
From a genomics perspective, it would be great to identify the regulatory mechanisms underlying the gene expression signatures we identified here. Additionally, within a single nest, there are both white morph and tan morph nestlings. This allows us to look at nestling morph specific responses to variation in parental care. We identified some differences between the morphs within a nest, but were ultimately limited by sample size to discuss this in depth. I think this will be a really interesting topic to explore further.

What would your message be for students about to start their first research projects in this topic? 
I suggest pursuing integrative projects, like much of the work published in Molecular Ecology. Associated with that, I suggest networking and establishing collaborations early. We can’t all be experts in everything, so collaborating with research groups that complement your interests can be beneficial.
More generally, keep up with the literature as much as you can. The more you know about your system and anything related to it, the better. Don’t forget to read up on methods papers, too. Data analysis is very important so having a grasp on analytical concepts will really help.

What have you learned about science over the course of this project? 
There’s no universal way to analyze data. There are so many tools to process genomic data, so it can be overwhelming at times to keep track of everything. I also learned that data analysis takes much longer than you plan. Inevitably something won’t work, so keeping a positive attitude throughout is crucial.

Describe the significance of this research for the general scientific community in one sentence.
Parental genotype is correlated with a transcriptomic stress response in their offspring.

Describe the significance of this research for your scientific community in one sentence.
Half of all adult white-throated sparrow pairs provide female-biased parental care and this stable parental care strategy induces a transcriptomic stress response in their offspring.

Newhouse DJ, Barcelo‐Serra M, Tuttle EM, Gonser RA, Balakrishnan CN. Parent and offspring genotypes influence gene expression in early life. Mol Ecol. 2019;28:4166–4180. https://doi.org/10.1111/mec.15205.

Interview with the authors: Background selection and FST: Consequences for detecting local adaptation

Recent work has suggested that background selection (BGS) may lead to incorrect inferences in FST outlier studies, generating substantial concern given the prevalence of these studies in evolutionary biology. In their recent Molecular Ecology publication, Matthey‐Doret and Whitlock investigate the effects of BGS on FST outlier tests using biologically realistic simulations, and find minimal effects. Matthey-Doret and Whitlock suggest that previous studies used unrealistic parameter values in simulations, leading to an overestimate of the effects of BGS in real studies. Read the full article here: https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.15197, and get a behind-the-scenes look at this work below.

Remi Matthey‐Doret uses his new program SimBit to study the effects of background selection (BGS) on FST.

What led to your interest in this topic / what was the motivation for this study? 
It all started with a paper by Cruickshank and Hahn (2014), in which they highlight a fear that background selection could be a confounding factor to local adaptation in FST outlier studies. Curious about this issue, Mike and I investigated the question further and quickly figured that many of these fears were based on misinterpretation of Charlesworth et al. (1997). Indeed, Charlesworth et al. (1997) demonstrated that background selection can cause FST peaks for extreme and unrealistic parameter sets only. They highlighted that their parameter choice was unrealistic as their goal was to find extreme effects, but this important limitation of their study was sadly often ignored by their readers. We therefore decided to perform simulations of background selection with realistic parameter choices.

What difficulties did you run into along the way? 
The main difficulty was technical. We tried to run these simulations with a number of popular simulation softwares but none of them were fast enough for our needs. We quickly realized that we had to write our own simulation software (SimBit) that would have a very high performance especially for simulations with a lot of genetic diversity. 

What is the biggest or most surprising finding from this study? 
Starting the study, I was actually expecting that background selection would have a stronger effect on FST and that it would bias FST outlier methods to detect local adaptation. Our finding was a surprise to us, but it was also comforting to realize that the results of the many studies using FST outlier methods were probably not affected by background selection. 

Moving forward, what are the next steps for this research? 
I think there is a need for a clarified view of the relative importance of positive and negative selection in explaining patterns of genetic diversity within and between populations. Also, I would wish to investigate further the interaction between selection coefficient and migration rate and how it affects within and between population genetic diversity. Such an endeavor would likely require a mixture of empirical and theoretical work.

What would your message be for students about to start their first research projects in this topic?  
I think there is a lot of intuition about the effect of linked selection in structured populations that has not been published. Talk to smart people! They may have some expectation about how background selection can affect the coalescent tree in structured populations that needs to be studied and written out.

What have you learned about science over the course of this project? 
I learned that a lot of the numeric tools that we use to analyse genetic data contain bugs (one of which is detailed in our article) and untold (or somewhat neglected) assumptions. One must always be very careful to have a good understanding about a particular statistical software before using it.

Describe the significance of this research for the general scientific community in one sentence.
We found that background selection does not cause peaks of population differentiation and therefore that methods that use population differentiation to detect positive selection should be safe to be used without worry of background selection being a confounding factor.

Describe the significance of this research for your scientific community in one sentence.
We found that background selection does not cause much variation in locus-to-locus variation in FST and therefore FST outlier methods to detect positive selection should be safe to be used without worry of background selection being a confounding factor.

Full article:

Matthey‐Doret R, Whitlock MC. Background selection and FST: Consequences for detecting local adaptation. Mol Ecol. 2019;28:3902–3914. https://doi.org/10.1111/mec.15197.

Interview with the authors: Lack of gene flow: Narrow and dispersed differentiation islands in a triplet of Leptidea butterfly species

A diverse array of evolutionary processes contribute to diversity and divergence, and as large genomic datasets become more readily available our ability to parse apart these processes increases. In their recent Molecular Ecology publication, Talla et al. generate genomic data from six populations of wood white butterflies and use this data to try to tease apart the effects of introgression, recombination rate variation, selection, and genetic drift. In contrast to many previous genome-scan studies, they find no evidence of introgression or parallelism. Rather, they find support for genetic drift and directional selection as having shaped genomic divergence between species. Read the full article here: https://doi.org/10.1111/mec.15188, and learn more below with a behind-the-scenes interview with the authors.

What led to your interest in this topic / what was the motivation for this study? 
We have a general interest in understanding the contribution of different molecular mechanisms and evolutionary forces to genomic differentiation between diverging lineages. Previous research in this area has revealed a rather complex interaction between selection, genetic drift, recombination rate variation and introgression and we thought we had found an ideal study system to tell these factors apart. In addition, we believe that it will be key to describe the divergence landscapes in many different taxonomic groups to understand the relative importance of different molecular and evolutionary factors in lineages with different genetic/genomic features, demographic histories and life-history characteristics.

What difficulties did you run into along the way? 
Our expectations were not really met regarding the study system. First, earlier observations suggested that hybridization occurs between species pairs in the Leptidea group when they occur in sympatry, indicating that introgression might differ between sympatric and allopatric species pairs, but this turned out to be wrong. Second, butterflies lack centromeres and this could indicate a more even recombination landscape than what is generally observed in taxa with centromeres, but this we could not address with our data. Third, we expected that the divergence time between lineages was short which was again not right. Finally, the three species are characterized by large differences in karyotype, and we wanted to investigate if chromosomal rearrangements could underlie reproductive isolation, but this goal was actually out of reach with our data.

What is the biggest or most surprising finding from this study? 
It was surprising to us that there was no evidence for interspecific gene flow since hybrids have been observed. We were also very surprised by the deep divergence times between these virtually identical species. Besides that, we do not think the results are really surprising, but they do give some novel insight into the patterns of genomic divergence when there is no introgression and when chromosomes lack centromeres. One observation that we found interesting was that regions with high genetic differentiation (FST) had higher genetic divergence (DXY) than the genomic average. This may sound intuitive, but many previous ‘genome-scan’ studies have in fact found a negative relationship between differentiation and divergence, most likely as a consequence of reduced recombination in some regions leading to reduced diversity already before lineages started to diverge.

Moving forward, what are the next steps for this research? 
We are developing more resources to generate genome assemblies of multiple species in the study system and we are also working on establishing high-density linkage maps for multiple populations with different karyotypes. These tools will help us pinpoint chromosome rearrangements and investigate if these have played a role in the divergence process. The data will also be used to quantify the effects of fissions and fusions on the recombination landscape. We are also delving into other approaches to understand how ecological and behavioral differences between species leave footprints in the DNA sequences or epigenetic marks (and vice versa). Given the deep divergence times between species and the apparent lack of gene flow, we will mainly focus on intraspecific comparisons where we observe some incompatibilities between some populations with distinct karyotypes.

What would your message be for students about to start their first research projects in this topic? 
We would suggest to read up on the previous literature in detail. We also encourage students to contact leading researchers in the field to discuss potential questions. Most people are really helpful and interested in knowing about other research efforts within their field. Discussing directly with experienced researchers also gives a hint on the key questions that should be addressed to extend the knowledge in the field. Given the copious amount of data we generate these days and the integrative nature of the questions we ask, it is also crucial to develop some skills in bioinformatics and scripting and to have a network of collaborators/colleagues that can provide help and support in both theory, experimental studies and data analyses.

What have you learned about science over the course of this project? 
That science is an extremely time-consuming and dynamic process and that the first glimpse on the data not necessarily reflects the final results. Moreover, that project plans need to be worked over regularly to accommodate for that the initial strategies did not really work out as they were outlined. We also acknowledge the importance of establishing a network of colleagues with expertise in different areas of the field – we experience that most research projects within our field are getting more and more integrative and it will be increasingly difficult to conduct advanced research without collaboration.

Describe the significance of this research for the general scientific community in one sentence.
We verify that genomic differentiation between diverging lineages is affected by a complex interaction between molecular mechanisms and evolutionary forces and stress the importance of studying organisms with different genomic features, demographic histories and life-history characteristics.

Describe the significance of this research for your scientific community in one sentence.
In contrast to much of the previous work on patterns of genomic diversity and differentiation, our study provides insight into divergence processes when the effects of gene flow and/or a shared and highly variable recombination landscape are absent.

Full article: Talla V, Johansson A, Dincă V, et al. Lack of gene flow: Narrow and dispersed differentiation islands in a triplet of Leptidea butterfly species. Mol Ecol. 2019;28:3756–3770. https://doi.org/10.1111/mec.15188.

Methods summary: Addressing (one of) the challenges of RADseq

Article by Evan McCartney-Melstad and Brad Shaffer from University of California at Los Angeles

RADseq is a great method for gathering genomic data to answer biological questions across many different scales, from phylogenetics to population and landscape genetics. It is fast, inexpensive, and requires no previous knowledge about the species’ genomic architecture. However, with this flexibility comes challenges. In this paper we develop and bench test an approach to address what may be the biggest RADseq challenge: how to choose the right sequence similarity threshold that defines whether two non-identical sequencing reads arose from the same or different genomic locations. This problem goes to the heart of evolutionary genetics— if two sequences are considered to be homologous, or derived from the same ancestral genomic location with subsequent modification through time, then they tell us a great deal about evolutionary history. If they are paralogous, and map to separate locations, then they lack that shared evolutionary history. Getting this straight is perhaps the single most important step in using genomic data for evolutionary inference.

Heat maps showing pairwise data missingness at clustering thresholds of 88% (a) and 99% (b). 

Studies that include relatively distantly related samples, such as those asking phylogenetic or biogeographical questions, should expect that homologous sequences will have diverged over time and therefore require lower similarity thresholds that allow for that divergence. However, if the threshold is set too low, paralogs will be falsely assigned to the same genomic locus, leading to problems ranging from inflated missing data rates to inaccurate measures of genetic diversity. Rather than relying on rough guesses that are preset in software packages, our approach attempts to balance these two competing forces by quantifying the relationship between pairwise genetic relatedness (as estimated directly from the data) and summaries of the RADseq dataset including pairwise data missingness and the slope of isolation by distance among samples. The relationship between pairwise genetic distance and pairwise data missingness is particularly informative—although some positive correlation is expected as mutations accumulate in enzyme restriction sites that RAD relies on, there is often a clear pattern of increased pairwise missingness that occurs when the most divergent homologous allelic variants begin to be erroneously oversplit into different presumptive loci. By explicitly looking for this breakpoint as a function of clustering threshold, researchers can choose a value that allows them to maximize the number of genomic regions recovered while minimizing the erroneous oversplitting of highly divergent, but homologous loci.

Citation: McCartney‐Melstad, E, Gidiş, M, Shaffer, HB. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour. 2019; 19: 1195– 1204. https://doi.org/10.1111/1755-0998.13029

Methods summary: Applying CRISPR to detect eDNA

Article by Molly-Ann Williams and Anne Parle-McDermott both from Dublin City University

We were challenged to design and build a simple and rapid species monitoring system. Why do we need such a system?  Biodiversity loss is at an all-time high and such a system would help to support the management and conservation of fish species within aquatic environments by acquiring knowledge of species distribution that traditionally is gained through visual detection and counting. These methods are expensive, time consuming and can lead to harm of the species of interest.    We decided that environmental DNA (eDNA) was the way to go but we had to solve the ‘PCR problem’ i.e., avoid having to do cyclical high temperatures as that would see us ending up with a costly, once-off device that would likely not be applied outside our lab.  This got us brainstorming and led us to a novel isothermal detection method, combining Recombinase Polymerase Amplification with CRISPR-Cas detection, which simplifies the adaptation of nucleic acid detection on to a biosensor device.

This innovative methodology utilises the collateral cleavage activity of Cas12a, a ribonuclease guided by a highly specific single CRISPR RNA, to detect specific species from eDNA. We proved it could work for eDNA by applying the technology to the detection of Salmo salar from eDNA samples collected in Irish rivers, where presence or absence had been previously confirmed using conventional field sampling. The beauty of this advance is that it can be applied to any species in the environment.  Not only does this assay solve the ‘PCR problem’, it is also is a better approach for distinguishing very closely related species.  We look forward to others in the field adapting it to their own favourite species of interest.  

Citation: Williams, M‐A, O’Grady, J, Ball, B, et al. The application of CRISPR‐Cas for single species identification from environmental DNA. Mol Ecol Resour. 2019; 19: 1106– 1114. https://doi.org/10.1111/1755-0998.13045