Interview with the authors: Background selection and FST: Consequences for detecting local adaptation

Recent work has suggested that background selection (BGS) may lead to incorrect inferences in FST outlier studies, generating substantial concern given the prevalence of these studies in evolutionary biology. In their recent Molecular Ecology publication, Matthey‐Doret and Whitlock investigate the effects of BGS on FST outlier tests using biologically realistic simulations, and find minimal effects. Matthey-Doret and Whitlock suggest that previous studies used unrealistic parameter values in simulations, leading to an overestimate of the effects of BGS in real studies. Read the full article here: https://onlinelibrary.wiley.com/doi/pdf/10.1111/mec.15197, and get a behind-the-scenes look at this work below.

Remi Matthey‐Doret uses his new program SimBit to study the effects of background selection (BGS) on FST.

What led to your interest in this topic / what was the motivation for this study? 
It all started with a paper by Cruickshank and Hahn (2014), in which they highlight a fear that background selection could be a confounding factor to local adaptation in FST outlier studies. Curious about this issue, Mike and I investigated the question further and quickly figured that many of these fears were based on misinterpretation of Charlesworth et al. (1997). Indeed, Charlesworth et al. (1997) demonstrated that background selection can cause FST peaks for extreme and unrealistic parameter sets only. They highlighted that their parameter choice was unrealistic as their goal was to find extreme effects, but this important limitation of their study was sadly often ignored by their readers. We therefore decided to perform simulations of background selection with realistic parameter choices.

What difficulties did you run into along the way? 
The main difficulty was technical. We tried to run these simulations with a number of popular simulation softwares but none of them were fast enough for our needs. We quickly realized that we had to write our own simulation software (SimBit) that would have a very high performance especially for simulations with a lot of genetic diversity. 

What is the biggest or most surprising finding from this study? 
Starting the study, I was actually expecting that background selection would have a stronger effect on FST and that it would bias FST outlier methods to detect local adaptation. Our finding was a surprise to us, but it was also comforting to realize that the results of the many studies using FST outlier methods were probably not affected by background selection. 

Moving forward, what are the next steps for this research? 
I think there is a need for a clarified view of the relative importance of positive and negative selection in explaining patterns of genetic diversity within and between populations. Also, I would wish to investigate further the interaction between selection coefficient and migration rate and how it affects within and between population genetic diversity. Such an endeavor would likely require a mixture of empirical and theoretical work.

What would your message be for students about to start their first research projects in this topic?  
I think there is a lot of intuition about the effect of linked selection in structured populations that has not been published. Talk to smart people! They may have some expectation about how background selection can affect the coalescent tree in structured populations that needs to be studied and written out.

What have you learned about science over the course of this project? 
I learned that a lot of the numeric tools that we use to analyse genetic data contain bugs (one of which is detailed in our article) and untold (or somewhat neglected) assumptions. One must always be very careful to have a good understanding about a particular statistical software before using it.

Describe the significance of this research for the general scientific community in one sentence.
We found that background selection does not cause peaks of population differentiation and therefore that methods that use population differentiation to detect positive selection should be safe to be used without worry of background selection being a confounding factor.

Describe the significance of this research for your scientific community in one sentence.
We found that background selection does not cause much variation in locus-to-locus variation in FST and therefore FST outlier methods to detect positive selection should be safe to be used without worry of background selection being a confounding factor.

Full article:

Matthey‐Doret R, Whitlock MC. Background selection and FST: Consequences for detecting local adaptation. Mol Ecol. 2019;28:3902–3914. https://doi.org/10.1111/mec.15197.

Interview with the authors: Lack of gene flow: Narrow and dispersed differentiation islands in a triplet of Leptidea butterfly species

A diverse array of evolutionary processes contribute to diversity and divergence, and as large genomic datasets become more readily available our ability to parse apart these processes increases. In their recent Molecular Ecology publication, Talla et al. generate genomic data from six populations of wood white butterflies and use this data to try to tease apart the effects of introgression, recombination rate variation, selection, and genetic drift. In contrast to many previous genome-scan studies, they find no evidence of introgression or parallelism. Rather, they find support for genetic drift and directional selection as having shaped genomic divergence between species. Read the full article here: https://doi.org/10.1111/mec.15188, and learn more below with a behind-the-scenes interview with the authors.

What led to your interest in this topic / what was the motivation for this study? 
We have a general interest in understanding the contribution of different molecular mechanisms and evolutionary forces to genomic differentiation between diverging lineages. Previous research in this area has revealed a rather complex interaction between selection, genetic drift, recombination rate variation and introgression and we thought we had found an ideal study system to tell these factors apart. In addition, we believe that it will be key to describe the divergence landscapes in many different taxonomic groups to understand the relative importance of different molecular and evolutionary factors in lineages with different genetic/genomic features, demographic histories and life-history characteristics.

What difficulties did you run into along the way? 
Our expectations were not really met regarding the study system. First, earlier observations suggested that hybridization occurs between species pairs in the Leptidea group when they occur in sympatry, indicating that introgression might differ between sympatric and allopatric species pairs, but this turned out to be wrong. Second, butterflies lack centromeres and this could indicate a more even recombination landscape than what is generally observed in taxa with centromeres, but this we could not address with our data. Third, we expected that the divergence time between lineages was short which was again not right. Finally, the three species are characterized by large differences in karyotype, and we wanted to investigate if chromosomal rearrangements could underlie reproductive isolation, but this goal was actually out of reach with our data.

What is the biggest or most surprising finding from this study? 
It was surprising to us that there was no evidence for interspecific gene flow since hybrids have been observed. We were also very surprised by the deep divergence times between these virtually identical species. Besides that, we do not think the results are really surprising, but they do give some novel insight into the patterns of genomic divergence when there is no introgression and when chromosomes lack centromeres. One observation that we found interesting was that regions with high genetic differentiation (FST) had higher genetic divergence (DXY) than the genomic average. This may sound intuitive, but many previous ‘genome-scan’ studies have in fact found a negative relationship between differentiation and divergence, most likely as a consequence of reduced recombination in some regions leading to reduced diversity already before lineages started to diverge.

Moving forward, what are the next steps for this research? 
We are developing more resources to generate genome assemblies of multiple species in the study system and we are also working on establishing high-density linkage maps for multiple populations with different karyotypes. These tools will help us pinpoint chromosome rearrangements and investigate if these have played a role in the divergence process. The data will also be used to quantify the effects of fissions and fusions on the recombination landscape. We are also delving into other approaches to understand how ecological and behavioral differences between species leave footprints in the DNA sequences or epigenetic marks (and vice versa). Given the deep divergence times between species and the apparent lack of gene flow, we will mainly focus on intraspecific comparisons where we observe some incompatibilities between some populations with distinct karyotypes.

What would your message be for students about to start their first research projects in this topic? 
We would suggest to read up on the previous literature in detail. We also encourage students to contact leading researchers in the field to discuss potential questions. Most people are really helpful and interested in knowing about other research efforts within their field. Discussing directly with experienced researchers also gives a hint on the key questions that should be addressed to extend the knowledge in the field. Given the copious amount of data we generate these days and the integrative nature of the questions we ask, it is also crucial to develop some skills in bioinformatics and scripting and to have a network of collaborators/colleagues that can provide help and support in both theory, experimental studies and data analyses.

What have you learned about science over the course of this project? 
That science is an extremely time-consuming and dynamic process and that the first glimpse on the data not necessarily reflects the final results. Moreover, that project plans need to be worked over regularly to accommodate for that the initial strategies did not really work out as they were outlined. We also acknowledge the importance of establishing a network of colleagues with expertise in different areas of the field – we experience that most research projects within our field are getting more and more integrative and it will be increasingly difficult to conduct advanced research without collaboration.

Describe the significance of this research for the general scientific community in one sentence.
We verify that genomic differentiation between diverging lineages is affected by a complex interaction between molecular mechanisms and evolutionary forces and stress the importance of studying organisms with different genomic features, demographic histories and life-history characteristics.

Describe the significance of this research for your scientific community in one sentence.
In contrast to much of the previous work on patterns of genomic diversity and differentiation, our study provides insight into divergence processes when the effects of gene flow and/or a shared and highly variable recombination landscape are absent.

Full article: Talla V, Johansson A, Dincă V, et al. Lack of gene flow: Narrow and dispersed differentiation islands in a triplet of Leptidea butterfly species. Mol Ecol. 2019;28:3756–3770. https://doi.org/10.1111/mec.15188.

Methods summary: Addressing (one of) the challenges of RADseq

Article by Evan McCartney-Melstad and Brad Shaffer from University of California at Los Angeles

RADseq is a great method for gathering genomic data to answer biological questions across many different scales, from phylogenetics to population and landscape genetics. It is fast, inexpensive, and requires no previous knowledge about the species’ genomic architecture. However, with this flexibility comes challenges. In this paper we develop and bench test an approach to address what may be the biggest RADseq challenge: how to choose the right sequence similarity threshold that defines whether two non-identical sequencing reads arose from the same or different genomic locations. This problem goes to the heart of evolutionary genetics— if two sequences are considered to be homologous, or derived from the same ancestral genomic location with subsequent modification through time, then they tell us a great deal about evolutionary history. If they are paralogous, and map to separate locations, then they lack that shared evolutionary history. Getting this straight is perhaps the single most important step in using genomic data for evolutionary inference.

Heat maps showing pairwise data missingness at clustering thresholds of 88% (a) and 99% (b). 

Studies that include relatively distantly related samples, such as those asking phylogenetic or biogeographical questions, should expect that homologous sequences will have diverged over time and therefore require lower similarity thresholds that allow for that divergence. However, if the threshold is set too low, paralogs will be falsely assigned to the same genomic locus, leading to problems ranging from inflated missing data rates to inaccurate measures of genetic diversity. Rather than relying on rough guesses that are preset in software packages, our approach attempts to balance these two competing forces by quantifying the relationship between pairwise genetic relatedness (as estimated directly from the data) and summaries of the RADseq dataset including pairwise data missingness and the slope of isolation by distance among samples. The relationship between pairwise genetic distance and pairwise data missingness is particularly informative—although some positive correlation is expected as mutations accumulate in enzyme restriction sites that RAD relies on, there is often a clear pattern of increased pairwise missingness that occurs when the most divergent homologous allelic variants begin to be erroneously oversplit into different presumptive loci. By explicitly looking for this breakpoint as a function of clustering threshold, researchers can choose a value that allows them to maximize the number of genomic regions recovered while minimizing the erroneous oversplitting of highly divergent, but homologous loci.

Citation: McCartney‐Melstad, E, Gidiş, M, Shaffer, HB. An empirical pipeline for choosing the optimal clustering threshold in RADseq studies. Mol Ecol Resour. 2019; 19: 1195– 1204. https://doi.org/10.1111/1755-0998.13029

Methods summary: Applying CRISPR to detect eDNA

Article by Molly-Ann Williams and Anne Parle-McDermott both from Dublin City University

We were challenged to design and build a simple and rapid species monitoring system. Why do we need such a system?  Biodiversity loss is at an all-time high and such a system would help to support the management and conservation of fish species within aquatic environments by acquiring knowledge of species distribution that traditionally is gained through visual detection and counting. These methods are expensive, time consuming and can lead to harm of the species of interest.    We decided that environmental DNA (eDNA) was the way to go but we had to solve the ‘PCR problem’ i.e., avoid having to do cyclical high temperatures as that would see us ending up with a costly, once-off device that would likely not be applied outside our lab.  This got us brainstorming and led us to a novel isothermal detection method, combining Recombinase Polymerase Amplification with CRISPR-Cas detection, which simplifies the adaptation of nucleic acid detection on to a biosensor device.

This innovative methodology utilises the collateral cleavage activity of Cas12a, a ribonuclease guided by a highly specific single CRISPR RNA, to detect specific species from eDNA. We proved it could work for eDNA by applying the technology to the detection of Salmo salar from eDNA samples collected in Irish rivers, where presence or absence had been previously confirmed using conventional field sampling. The beauty of this advance is that it can be applied to any species in the environment.  Not only does this assay solve the ‘PCR problem’, it is also is a better approach for distinguishing very closely related species.  We look forward to others in the field adapting it to their own favourite species of interest.  

Citation: Williams, M‐A, O’Grady, J, Ball, B, et al. The application of CRISPR‐Cas for single species identification from environmental DNA. Mol Ecol Resour. 2019; 19: 1106– 1114. https://doi.org/10.1111/1755-0998.13045

Interview with the authors: Genomic signatures of sympatric speciation with historical and contemporary gene flow in a tropical anthozoan (Hexacorallia: Actiniaria)

Though increasing numbers of empirical studies suggest that sympatric speciation may be more common than previously thought, it is difficult to quantify the prevalence of sympatric speciation, since many different processes may lead to co-distributed sister species pairs. This difficulty is particularly pronounced in marine systems where there are relatively few barriers to dispersal. A recent paper by Benjamin Titus, Paul Blischak, and Marymegan Daly provides one of the first model-based investigations of sympatric speciation in a reef system. Titus and colleagues find support for cryptic diversity in the corkscrew anemone (Bartholomea annulata), and the two lineages that they recover co-occur. Model-based analyses support isolation with migration or secondary contact, suggesting that sympatric speciation may have occurred between these lineages. Finally, Titus and colleagues identify six loci that are putatively under divergent selection between these two lineages. Below, we go behind the scenes with lead author Benjamin Titus. Read the full article here.

Photo credit: Benjamin Titus.

What led to your interest in this topic / what was the motivation for this study? 
The motivation for this study evolved quite a bit from when I initially started the project. Initially, this work was part of a broader comparative phylogeographic study. However, like many poorly studied marine inverts, the anemone turned out to be a cryptic species complex that was fully co-distributed throughout its range. Since we found no obvious ecological differences between the cryptic taxa, the project shifted focus towards testing competing biogeographic diversification scenarios. Marine systems are highly dynamic, and species that diversify in allopatry can readily become co-distributed following secondary contact. Ultimately, we wanted to use model selection analyses to make objective inferences regarding the likelihood that this species diversified sympatrically versus allopatrically followed by secondary contact.

What difficulties did you run into along the way? 
Tropical anthozoans (e.g. corals, sea anemones, zoanthids, corallimorpharians) generally harbor endosymbiotic dinoflagellates, which allow these animals to thrive in the nutrient-poor waters of the tropics. Unfortunately, there is no avoiding them in field-collected samples, and the resulting DNA extractions harbor an unknown mix of anthozoan and dinoflagellate DNA. When I started this work no universal population-level markers existed for the Class Anthozoa, so we used a reduced representation sequencing approach. Thus, our resulting RADseq dataset is, presumably, an unknown mix of target and dinoflagellate DNA. Ultimately, we were really lucky there was a full genome from a closely related species that we could map our reads to so we could be confident that we were only left with anthozoan sequences.  

What is the biggest or most surprising finding from this study? 
I think there are a couple of important takeaways. The first is that coral reefs harbor an immense amount of biodiversity on a small fraction of seafloor, and in a setting with few hard barriers to dispersal. Sympatric speciation should be a major evolutionary process on coral reefs, but it’s rarely tested for explicitly. Given that different evolutionary processes can lead to similar biogeographic outcomes, our study is a rare empirical example demonstrating the importance of sympatric speciation on reefs.
The second is that this is the first range-wide phylogeographic study for a tropical sea anemone species, and our finding that Bartholomea annulata is a species complex underscores just how underdescribed sea anemone diversity likely is.

Moving forward, what are the next steps for this research? 
Our sampling here was necessarily coarse in order to cover the entire range of this species complex in the Tropical Western Atlantic. Fine scale sampling and sequencing would be nice to try and pin down any ecological differences between these cryptic taxa that may exist. Broadly, the field of marine phylogeography needs more evolutionary studies that incorporate demographic modeling into their analyses so we can better understand the relative contributions of allopatric and sympatric speciation on coral reefs.

What would your message be for students about to start their first research projects in this topic? 
Some of the most widely recognized species are actually cryptic species complexes. If you work on a poorly studied group and want to conduct population-level research, make sure you take the time to confirm you are only dealing with a single species. This is true for any group, but is especially true for marine invertebrates.

What have you learned about science over the course of this project? 
Staying on the poorly studied taxa theme, if you work on one, there’s an immense amount of basic systematic research that needs to be done. This project came out of my dissertation research, which I developed on what I thought were common and widely recognized species. A lot of my work turned into disentangling the systematics of cryptic species complexes. This is time consuming, but important so that downstream studies are framed in the proper taxonomic context.

Describe the significance of this research for the general scientific community in one sentence.
Sympatric speciation is an important, but difficult to demonstrate, evolutionary process in the marine environment.

Describe the significance of this research for your scientific community in one sentence.
Explicit tests of competing diversification scenarios are important to disentangle different evolutionary processes that can lead to similar biogeographic outcomes on coral reefs

Summary from the authors: Boomeranging around Australia: Historical biogeography and population genomics of the anti-equatorial fish Microcanthus strigatus (Teleostei: Microcanthidae)

Photo credit: Shigeru Harazaki.

The study of species and where they live is of particular interest to biologists, because it not only allows us to gain insight into genetic diversity, but also into how different populations interact. Animals with widespread distributions are often assumed to be of least concern. This can be misleading, as it does not take into account the possibility of fragmentation and population disjunction. The Stripey fish Microcanthus strigatus is one example, as it is listed as being of least concern on the IUCN Red List. Although it spans a wide distribution across the western Pacific and eastern Indian Oceans, our study suggests that populations in Western Australia, the southwest Pacific (including eastern Australia), Hawaii and East Asia are very genetically divergent. Several of these populations have been isolated since the last glacial cycle in the Pleistocene epoch, and are currently so fragmented that no contemporary genetic exchange occurs. This is of significant conservation concern as a once widespread population is revealed to consist of four cryptic groups, especially in light of evidence suggesting that the Hawaiian population is currently in decline and that the southwest Pacific population is distinct enough to warrant recognition as a different species. 

Read the full article:
Tea Y‐K, Van Der Wal C, Ludt WB, Gill AC, Lo N, Ho SYW. Boomeranging around Australia: Historical biogeography and population genomics of the anti‐equatorial fish Microcanthus strigatus (Teleostei: Microcanthidae). Mol Ecol. 2019;28:3771–3785. https://doi.org/10.1111/mec.15172