Interview with the authors: Molecular dating for phylogenies containing a mix of populations and species by using Bayesian and RelTime approaches

Written by Beatriz Mello and Sudhir Kumar

The work presents the most extensive evaluation to date of relaxed-clock methods’ performance to infer molecular times for datasets that contain a mixture of population and species divergences. Such datasets are commonly used in phylogeography, phylodynamics, and species delimitation studies. A wide range of biological scenarios was explored, which allowed us to compare and contrast the accuracies and precisions of divergence times for a Bayesian (BEAST) and a non-Bayesian (RelTime in MEGA)  method. Results showed that both RelTime and BEAST generally perform well and that RelTime presents a reliable and computationally efficient alternative to speed up molecular dating.

Read the full text here.

Lead author Beatriz Mello.

What led to your interest in this topic / what was the motivation for this study?

Our interest in this topic was driven by a major dilemma faced by researchers when analyzing data containing molecular sequences from closely related individuals and individuals from distinct species. This is because the Bayesian framework requires a tree prior to model the inference of divergence times. There is a myriad of tree priors available, but most importantly, they either model divergence between species or intra-species divergences. Thus, the adopted tree prior will be suboptimal to describe the evolutionary process for datasets with mixed sampling. So, our question was, although misspecified, would the use of the same tree prior produce good time estimates? Also, no one has previously examined how well non-Bayesian methods perform for such datasets, as they do not require specification of priors.

What difficulties did you run into along the way? 

One of the major difficulties we faced was the computational burden of Bayesian analysis. We all know that molecular dating using Bayesian methods can be time-consuming. However, they can become onerous in computer simulation studies because many datasets need to be analyzed. Each Bayesian analysis took several hours to complete, and we had to conduct thousands of Bayesian analyses. This was not an issue with the RelTime method, which finished computing in minutes. 

What is the biggest or most surprising innovation highlighted in this study? 

Our biggest finding is that, although the tree prior will frequently be an erroneous description of biological evolution, the accuracy of time estimates is not greatly impacted for most choices of the tree prior. This is good news to researchers working with phylogenies containing a mix of population and species. On top of that, RelTime is much faster than the Bayesian approach and produces similar results. This finding is important since the amount of sequence data is increasingly growing. A fast and accurate method allows hypotheses testing to be done using different assumptions and data subsets, improving the scientific rigor and reproducibility by others.

Moving forward, what are the next steps in this area of research?

For Bayesian methods, it will be useful to develop faster approaches. However, the excellent performance of the RelTime approach that does not require prior specification is very encouraging. Evolutionary simulations employing even more diverse biological conditions and tree topologies, especially involving many sequences, will be a very useful next step, which may only be feasible with RelTime and other fast methods.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Our main message for students is to realize that no method is almighty. For those aspiring to develop new methods, it is our first step to apply different methods to a diversity of datasets and examine how the results differ, why they differ, and whether we can solve the problem discovered. It is again important for those applying new methods to use different methods and scrutinize differences in results. It is not a good idea to assume that a popular protocol is better than others by default; we need to keep an open mind and make decisions with evidence.

What have you learned about methods and resources development over the course of this project?

All of us learned quite a lot about the multispecies coalescent approach by analyzing simulated data because we know the correct result. The lesson was that some methods require many assumptions and that sometimes even small changes can have a big impact, resulting in distinct evolutionary inferences. So, we need to be very careful and explore a wide range of biological assumptions. Also, there is a strong need for more realistic simulation studies.

Describe the significance of this research for the general scientific community in one sentence.

Researchers will now be able to decide which methods and approaches to apply in their particular dataset using results from this study.

Describe the significance of this research for your scientific community in one sentence.

The accuracy and precision of divergence time estimation for datasets that contain both intra- and interspecies molecular sequences is tested for slow (Bayesian) and fast (RelTime) molecular dating approaches.

References

Mello B, Tao Q, Barba-Montoya J, Kumar S. Molecular dating for phylogenies containing a mix of populations and species by using Bayesian and RelTime approaches. Mol Ecol Resour. 2021;21:122–136. https://doi.org/10.1111/1755-0998.13249

Summary from the authors: Detecting selected haplotype blocks in evolve and resequence experiments

How organisms adapt to changes in the environment is not only a central question of evolutionary biology but also relevant to the threat of recent global warming. Evolution experiments in controlled laboratory settings (Experimental Evolution) are a great tool for evaluating evolutionary processes. When combined with genome sequencing (Evolve and Resequence), genomic changes related to adaptation can be identified. Although these genomic changes can occur in large parts of a chromosome (selected haplotype block), most approaches focus only on single genomic sites, and in consequence might overestimate the signal of evolution. Here, we present a novel method for detecting such selected haplotype blocks in evolve and resequence experiments. Our approach requires only few input parameters and is based on the grouping of neighboring genomic sites and on a comparison of different chromosomes. Analyzing computer simulations and experimental data, we describe distinct haplotype block patterns related to the number of genomic sites under selection and to the speed of adaptation. Our results indicate that the analysis of selected haplotype blocks has indeed the potential to deepen our understanding of adaptation.

Read the full text here.

Figure 1: Left: Flies are a powerful model organism to study temperature adaptation from standing genetic variation in evolve and resequence experiments (modified from Mallard et al., 2018). Right: Selected haplotype blocks (blue) spanning large parts of a chromosome are present in the majority of individuals after 60 generations of experimental evolution.

References

Otte KA, Schlötterer C. Detecting selected haplotype blocks in evolve and resequence experiments. Mol Ecol Resour. 2021;21:93–109. https://doi.org/10.1111/1755-0998.13244

Mallard, François, et al. A simple genetic basis of adaptation to a novel thermal environment results in complex metabolic rewiring in Drosophila. Genome biology 2018:19.1: 1-15. https://doi.org/10.1186/s13059-018-1503-4.

Interview with the authors: Museum epigenomics: Characterizing cytosine methylation in historic museum specimens

Recent work has shown that it may be possible to characterize epigenetic markers from museum specimens, suggesting yet another potential contribution of collections-based research. In their recent Molecular Ecology Resources paper, Rubi et al. used ddRAD and bisulphite treatment to characterize cytosine methylation in deer mice (Peromyscus spp.). They characterized methylation in specimens from 1940, 2003, and 2013-2016. While they were able to characterize patterns in all specimens, older specimens had reduced methylation estimates, less data, and more interindividual variation in data yield than did new specimens. Rubi et al. demonstrate the promise of museum epigenetics while highlighting technical challenges that researchers should consider. Read the interview with lead author Dr. Tricia Rubi below to get a behind-the-scenes look at the research behind the paper.

Read the full paper here.

Peromyscus maniculatus skull collected in 2002 and housed in the University of Michigan Museum of Zoology collection. Photo Credit: Dr. Tricia Rubi

What led to your interest in this topic / what was the motivation for this study? 

When I wrote the original proposal for this work, the earliest papers had just been published in the field of ancient epigenomics (epigenetic studies using paleontological or archaeological specimens). My proposal centered around museum specimens, and I realized that no work had been done looking at epigenetic effects in more recent historic specimens (decades to centuries old), which comprise the bulk of museum collections. The recent field of museum genomics has already opened up a range of new directions for research using collections; I believe that museum epigenomics could be a similar frontier in collections-based research. In particular, epigenomic studies using museum collections could allow us to characterize change over time, which may help clarify the role of epigenetic effects in ecological and evolutionary processes.

What difficulties did you run into along the way? 

As is the case when developing any novel protocol, we encountered a variety of challenges and dead ends. However, we found that the main challenge for DNA methylation work using museum specimens was actually the same as the main challenge for regular genetic work using museum specimens: recovering usable amounts of DNA in the initial DNA extraction. DNA quantity and quality seemed to be a better predictor of success than specimen age; our oldest specimens (~76 years old) with higher DNA concentrations yielded a similar amount of methylation data relative to much “younger” specimens. The upside is that this challenge is already a familiar one to researchers conducting museum genomics work. Our data suggests that historic DNA samples that have been successfully used for genomic analyses are probably also well suited for methylation analyses.

What is the biggest or most surprising innovation highlighted in this study? 

I think the main takeaway from this study is that DNA methylation analyses using historic collections is feasible, even for lower quality specimens such as traditional bone preparations that are several decades old. Our oldest specimens in this study were dried skulls collected in 1940; while those specimens showed considerable variation in the amount of recoverable DNA, the specimens that yielded higher DNA concentrations performed well in our analyses.

Moving forward, what are the next steps in this area of research?

There is plenty of work to be done! In this paper we highlight future directions for both developing methodology and applying museum epigenomics to ecological and evolutionary questions. Increasing the number of sequenced methylation markers or refining protocols for targeted sequencing are some obvious first steps in improving methods. Museum epigenomics approaches could be used to tackle a variety of questions in ecological and evolutionary epigenomics. In particular, epigenomic studies using museum specimens could be used to infer gene expression in past populations, or to directly measure how epigenetic markers change over time. 

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Developing or refining novel techniques is an important and potentially rewarding process, but it requires enormous patience, as well as correctly managed expectations about the outcomes of the work. Researchers should be prepared for slower progress and a higher failure rate. Even when protocols do work, it may be more difficult to test broader ecological hypotheses due to unforeseen problems or non-optimal results. However, the upside is that projects using novel approaches can provide an important contribution to the field regardless of the specific outcomes of the work. My advice would be to design projects with several contingency plans to ensure that publishable data can be produced, and to factor in extra time for troubleshooting each step of the novel protocols.

Describe the significance of this research for the general scientific community in one sentence.

Natural history specimens retain patterns of in vivo DNA methylation, the best studied epigenetic marker; museum epigenomics may be the next frontier in collections-based research.

References

Rubi TL, Knowles LL, Dantzer B. Museum epigenomics: Characterizing cytosine methylation in historic museum specimens. Mol Ecol Resour. 2020;20:1161–1170.

Summary from the authors: A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues

We aimed to sequence and compare all the DNA (eg., the genome) of a bunch of different deer mice (genus Peromyscus) species to understand how some deer mice survive in hot deserts with little to no water. A number of deer mice tissue samples were available through natural history museums, which house the raw materials for genetic and biodiversity investigations, but the samples had been collected many years earlier. Older samples produce lower quality DNA that has been broken into many pieces over time. Our normal sequencing procedure selectively removes small fragments of DNA, which would essentially throw away all the DNA we wanted to sequence for these older samples! To circumvent this, we were able to use a different DNA library preparation method called linked-read sequencing (LRS). LRS uses standard short-read sequencing technology, but adds additional information about the location of DNA fragments within the genome by bundling and barcoding DNA fragments that are located near each other prior to sequencing (eg., ‘links’ DNA fragments together in ‘genome-space’). We found that this method improves the overall quality and completeness of genome assemblies from historical tissue samples, in less time and with less effort than traditional shot-gun sequencing methods. This alternative method may be particularly valuable for building high-quality genome assemblies for extinct species for which there are no new samples being collected for or endangered species that are difficult to sample or collect. LRS adds to the suite of genomic methods that continue to unlock the secrets of natural history collections and enable fine-scale genetic measurement of change through time.

This summary was written by the study’s first author, Jocelyn Colella.

Read the full text here.

Video credit: Jocelyn Colella. Peromyscus in the field.

Full Text: Colella JP, Tigano A, MacManes MD. A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues. Mol Ecol Resour. 2020;20:871–881.

Interview with the authors: Evaluation of model fit of inferred admixture proportions

Admixture models are widely-used in population genetics, but they make several simplifying assumptions, which, if violated, could result in misleading estimates of individual ancestry proportions. In a recent paper published in Molecular Ecology Resources, Garcia-Erill and Albrechtsen introduce evalAdmix, a program for detecting poor fit of admixture models to empirical data. evalAdmix uses the correlations of the residual differences between true and predicted genotypes to detect poor fit; when the assumptions of the model are not violated, the residuals of a pair of individuals should be uncorrelated. In simulation studies and analyses of empirical datasets, evalAdmix was useful in identifying model violations due to gene flow from unsampled ghost populations, continuous variation, population bottlenecks, and an incorrect assumed number of ancestral populations. Read the full article here, and read below for an exclusive interview with lead author Genís Garcia-Erill.

Full text: Garcia-Erill G. and Albrechtsen A. Evaluation of model fit of inferred admixture proportions. Mol Ecol Resour. 2020;20:936–949. https://doi.org/10.1111/1755-0998.13171.

Admixture model and evaluation with our method applied to worldwide human genetic variation. A. Admixture proportions inferred with ADMIXTURE assuming K=5 for all human populations from the 1000Genomes project. B. Evaluation of admixture model with the correlation of residuals performed with evalAdmix. Positive correlations are indicative of a bad model fit. The correlation of residuals shows that modelling with an ancestral population for each of the 5 major continental groups leads to a bad fit within most populations, and furthermore it gives additional information. For example we can see that the populations more genetically distant from the rest with which they are grouped, like Luhya in Webuye, Kenya (LWK) or Finish in Finland (FIN), have higher correlations of residuals, or it indicates the presence of substructure in some populations like the Gujarati Indians in Houston, TX (GIH).

What led to your interest in this topic / what was the motivation for this study?

The admixture model is one of the most used methods in population genetics, but it has already been known for some time that there are many potential issues with it. Specifically a recent study described very nicely different scenarios that can lead to wrong conclusions when applying the admixture model (Lawson et al. 2018). For example, they showed how multiple scenarios can lead to the same admixture results, and they also presented a method, badMixture, that can distinguish between those scenarios and evaluate model fit. However badMixture is quite difficult to apply, so we thought it would be interesting to develop an alternative method that could help in guiding the interpretation of admixture model results.

What difficulties did you run into along the way?

My background is in Biology and I had limited experience in computer science and statistics when I started with this project, so most of the difficulties were related to my learning how to work in these two disciplines. The method itself was relatively straightforward, but in order for it to work properly we needed to find a way to correct the bias caused by the frequency estimation. The frequency correction is only a small part of the main article, but it was where we put most of the work during the development of the method; that ended up as a few pages full of equations in the supplementary material. Another aspect where I had to put considerable effort was in making the implementation, since again I did not have much experience in developing software that would (hopefully) be used by other people. That made me consider things I would not usually think about.

What is the biggest or most surprising innovation highlighted in this study?

I think the method itself is the main result of the study. As I said there is already a method to evaluate the admixture model fit, badMixture. However that method is rarely used, because it requires performing additional analyses with CHROMOPAINTER and also requires having data with good enough quality to at least call genotypes. The method we present is more generally accessible since it is based on information unique to the admixture model itself, meaning one can directly apply it to any data set to which the admixture model has been applied. So it provides what we think is a simple way, both in the application and in the interpretation, to evaluate the admixture model results.

Moving forward, what are the next steps in this area of research?

There are several directions in which this work could be expanded. Something we already spent some time on is trying to develop a more firm theoretical foundation for the correlation of residuals as a measure of model fit, for example expressing it in terms of individual-specific Fst and the distance between the populations from which they are sampled, in a framework similar to that in Ochoa and Storey (2018). In the end we could not figure out the math and left it as a short mention in the discussion, but that would be something very nice to do. We also could not find a good way to use the residuals to develop some sort of measure of model fit at a purely individual level (instead of depending on the relationship between pairs of individuals, as it does right now), and that would also be very nice to do. Moreover, individual frequencies can also be calculated using principal component analyses, so this method could be expanded to work as an evaluation of a PCA as a description of population structure. Finally what we are looking forward to the most is to see how the method is applied to different datasets and how that helps gain new scientific insights.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

I am myself a student who has very recently started developing and using novel techniques in Molecular Ecology, so I am not sure if I have enough experience and perspective to give any useful advice. But based on my limited experience, I would say that it is important not to be afraid to jump into new areas or fields where we feel like we might have too limited experience, and that often what at first seems very difficult will become more and more accessible and doable as we work on it.

What have you learned about methods and resources development over the course of this project?

I started working on this study during my Master studies, so it has been one of my first research experiences. Basically all I know about method development I learned during the course of this project, from the more practical skills related to developing and implementing a method to how to explain it, and make it accessible to the community that might be interested in using it. I realized that this can actually be very important, since it will affect how many people end up using it. Also, as a user of bioinformatics methods I really appreciate when I use a new method if it is easy to use and does not create too many problems.

Describe the significance of this research for the general scientific community in one sentence.

It is important to consider the assumptions of the methods we use, since relevant violations of the assumptions might result in misleading or even meaningless results.    

Describe the significance of this research for your scientific community in one sentence.

It makes it possible and easy to evaluate the model fit of the admixture model at the individual-level in almost any context in which the admixture model is currently used, so it can be applied before concluding a population is a mixture of others, or it can help to choose a meaningful number of ancestral populations.

References

Lawson, D. J., Van Dorp L., and Falush, D.. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nature Communications 2018;9.1: 1-11.

Ochoa, A. and Storey, J. D. FST and kinship for arbitrary population structures I: Generalized definitions. BioRxiv 2016: 083915.

Summary from the authors: Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest

Biodiversity inventories can now be built by collecting and sequencing DNA from the environment, which is not only easier, faster and cheaper than direct observation, but also much more comprehensive and systematic. This gives in particular unprecedented access to little-known microbial diversity. Tapping these data to answer community ecology questions, however, can prove a daunting task, as classical statistical approaches often fall short of the size and complexity of molecular datasets. To uncover the spatial structure of soil biodiversity over 12 ha of primary tropical forest in French Guiana, we borrowed a probabilistic model from text analysis. After demonstrating the performance of the method on simulated data, we used it to capture the co-occurrence and covariance patterns of more than 25,000 taxa of bacteria, protists, fungi and metazoans across 1,131 soil samples, collected every 10 m – a dataset that led to a previous publication in Mol. Ecol. (Zinger et al. 2019). We find that, even though the forest plot is at first sight rather uniform, bacteria, protists and fungi are all clearly structured into three assemblages matching the environmental heterogeneity of the plot, whereas metazoans are unstructured at that scale. We then work though the practical problems ecologists may encounter using this approach, such as whether to use presence-absence or read-count data, how to choose the number of assemblages and how to assess the robustness of the results. Finally, we discuss the potential use of related methods in community ecology and biogeography, and argue that probabilistic models are a way forward for analyzing the ever-expanding amount of data generated by the field.

Left: Primary tropical forest understory on the plot where the data were collected, Nouragues ecological research station, French Guiana. Right: Spatial distribution of assemblages of co-occurring soil taxa (OTUs), obtained by Latent Dirichlet Allocation from OTU presence-absence information only, over a 12-ha plot of plateau forest sampled every 10 m (top); and two main axes of environmental variation over the same forest plot, derived from Airborne Laser Scanning (bottom). Bacteria, protists and fungi exhibit a spatial pattern matching the environmental heterogeneity of the plot: the blue, green and red assemblages match the terra firme, hydromorphic and exposed rock parts of the plot, respectively. In contrast, metazoans such as annelids can be shown to be spatially unstructured at that scale. Sampled locations are indicated by dark dots, and values have been interpolated between samples using kriging.

Full article:
Sommeria-Klein G, Zinger L, Coissac E, et al. (2020). Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest. Molecular Ecology Resour. 20:371–386. https://doi.org/10.1111/1755-0998.13109

References:
Zinger, L., Taberlet, P., et al. (2019). Body size determines soil community assembly
in a tropical forest. Molecular Ecology, 28(3), 528–543. https://doi.org/10.1111/mec.14919

Interview with the authors: Modelling multilocus selection in an individual‐based, spatially‐explicit landscape genetics framework

Genetic variation in natural systems is complex and affected by a variety of processes, and this reality has contributed to the growing popularity of simulation-based approaches that can help researchers understand the processes acting in their systems. Despite the flexibility of simulation-based approaches, simulations of natural selection across a heterogeneous landscape have typically been limited to one or two loci (e.g. Landguth, Cushman, & Johnson, 2012). In a recent issue of Molecular Ecology Resources, Landguth et al. introduce an approach to model multilocus selection in a spatially-explicit, individual-based framework, implemented in the programs CDPOP and CDMetaPOP. Read the interview with lead author Erin Landguth below to learn about the challenges in developing this program, the potential of this approach to help understand complex genotype-environment associations, and the benefits of working with strong multidisciplinary team! Read the full article here.

Dr. Erin Landguth coding in CDPOP.

What led to your interest in this topic / what was the motivation for this study? 

Over the last two decades, there has been an exponential increase in landscape genetic studies, and still, the methodology and underlying theory of the field are under rapid and constant development. Furthermore, interest in simulating multilocus selection, including the ability to model more complex and realistic multivariate environmental scenarios, has been driven by the growing number of empirical genomic data sets derived from next-generation sequencing. We believe many of the major questions in landscape genetics require the development and application of sophisticated simulation tools to explore the interaction of gene flow, genetic drift, mutation, and natural selection in landscapes with a wide range of spatial and temporal complexities. Our interests lie in developing such tools and providing more flexible models that are linked to theory, and that better represent complex genetic variation in real systems. For example, adaptive traits often have a complex genetic basis that interacts with selection strength, gene flow, drift, and mutation rate in a multivariate environmental context; and this module provides the ability to simulate these processes across many adaptive and neutral loci in a landscape genetic context.

What difficulties did you run into along the way? 

When developing new modules for existing software packages, my first and primary goal is to validate these modules to theory where possible. This can take some time and many decisions, questions, and trial and errors come up along the way through this very important validation process. For multilocus selection, our validation process was to match simulation output with the theoretical expected change in allele frequencies for selection models developed by Sewall Wright in 1935. If the module is placed in the wrong location in the simulation workflow (i.e., timing) or if all of the Wright-Fisher assumptions are not matched exactly, then the simulation output will not match theoretical expectations. However, once all of these pieces are lined up, there is definitely a eureka moment, and I am then confident in the module’s performance for more complex scenarios where we will not be able to evaluate against theoretical expectations.

What is the biggest or most surprising innovation highlighted in this study? 

Multivariate environmental selection can produce complex landscape genetic patterns, even when only a few adaptive loci are involved. The relatively simple “complex” example simulated in the paper illustrates how complicated the underlying relationships can be between allele frequencies and environmental conditions. Simulating these complex relationships will be essential for testing genotype-environment association methods in a more rigorous fashion than has been seen so far. Additionally, the ability to simulate realistic landscape genetic scenarios that reflect the environmental complexity of actual landscapes will be important for validating findings from empirical data sets. 

A picture containing building

Description automatically generated
Outcome for simulation of a complex landscape and three loci. The three selection landscapes (Figure 1 of Landguth et al., 2020) are superimposed with lighter‐white areas referring to areas where all three landscapes have values of 1 and darker areas mean all three landscapes have values of −1. The copies (either 2, 1, or 0) of the first allele for each of the three loci are plotted, where darker green genotypes have more copies of these alleles (e.g., 2, 2, 2 corresponds to 2 copies of the first allele for the first, second and third loci, respectively). The first locus is associated with the categorical landscape (X1‐Figure 1a of Landguth et al., 2020). The second locus is associated with the gradient landscape (X2‐Figure 1b of Landguth et al., 2020). The third locus is associated with the habitat fragmented landscape (X3‐Figure 1c of Landguth et al., 2020).

Moving forward, what are the next steps in this area of research?

Epigenetics! We of course have a number of applications in progress for this current module, but we have already started beta testing our next module for simulating epigenetic processes in landscape genetics.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Starting a simulation study in landscape genetics for the first time can be daunting and intimidating. Fear not, we say! As with all software packages, there will be a learning curve, but if you persevere and get past the first few hurdles (e.g., learning the ins and outs of file formats, running the program in a potentially unfamiliar programming interface), the door will be opened to unlimited questions that can be addressed with simulations in your system. Additionally, just like any other field study or experiment, simulation modeling is most informative when coupled with specific questions and hypotheses and well-thought-out study designs.

What have you learned about methods and resources development over the course of this project? 

As we begin to add more complex modules to these simulation platforms, I am increasingly relying on multidisciplinary approaches and teams. For example, development of this current module required Brenna Forester for her expertise in landscape ecology and genotype-by-environment concepts, as well as Andrew Eckert, with his in-depth knowledge of population genetics theory, particularly the history of additive vs. multiplicative models for fitness.

Dr. Brenna Forester, post-doctoral researcher at Colorado State University and recently awarded David H. Smith Conservation Research Fellow, helped integrate key genotype-by-environment concepts into the new module.

Describe the significance of this research for the general scientific community in one sentence.

We have implemented a new module into the landscape genetic simulation programs CDPOP and CDMetaPOP that allows realistic multivariate environmental gradients to drive selection in a multilocus, individual-based, landscape genetic framework.

Describe the significance of this research for your scientific community in one sentence.

This new simulation module provides a valuable addition to the study of landscape genetics, allowing for explicit evaluation of the contributions and interactions between demography, gene flow, and selection-driven processes across multilocus genetic architectures and complex, multivariate environmental and landscape conditions.

References

Landguth EL, Forester BR, Eckert AJ, et al. (2020). Modelling multilocus selection in an individual-based, spatially-explicit landscape genetics framework. Molecular Ecology Resources, 20, 605–615. https://doi.org/10.1111/1755-0998.13121

Landguth, E. L., Cushman, S. A., & Johnson, N. A. (2012). Simulating natural selection in landscape genetics. Molecular Ecology Resources, 12, 363– 368. https://doi.org/10.1111/j.1755-0998.2011.03075.x

Wright, S. (1935). Evolution in populations in approximate equilibrium. Journal of Genetics, 30, 257– 266. https://doi.org/10.1007/BF02982240

Nominations for the Harry Smith Prize in Molecular Ecology

The editorial board of the journal Molecular Ecology is seeking nominations for the Harry Smith Prize, which recognizes the best paper published in Molecular Ecology in the previous year by graduate students or early career scholars with no more than five years of postdoctoral or fellowship experience. The prize comes with a cash award of US$1000 and an announcement in the journal and in the Molecular Ecologist.  The winner will also be asked to join a junior editorial board for the journal to offer advice on changing research needs and potentially serve as a guest editor. The winner of this annual prize is selected by the junior editorial board.

The prize is named after Professor Harry Smith FRS, who founded the journal and served as both its Chief and Managing Editor during the journal’s critical early years. He continued as the journal’s Managing Editor until 2008, and he went out of his way to encourage early career scholars. In addition to his editorial work, Harry was one of the world’s foremost researchers in photomorphogenesis, where he determined how plants respond to shading, leading to concepts such as “neighbour detection” and “shade avoidance,” which are fundamental to understanding plant responses to crowding and competition. More broadly his research provided an early example of how molecular data could inform ecology, and in 2008 he was awarded the Molecular Ecology Prize that recognized both his scientific and editorial contributions to the field.

Please send a PDF of the paper you are nominating, with a short supporting statement (no more than 250 words; longer submissions will not be accepted) directly to Dr. Janna Willoughby (jrw0107@auburn.edu) by 31 May 2020. Self-nominations are accepted.

Interview with the authors: Environmental heterogeneity and not vicariant biogeographic barriers generate community‐wide population structure in desert‐adapted snakes

Phylogeographic studies have long focused on striking biogeographic barriers, and comparative phylogeography often looks for shared divergence across such barriers as evidence of shared responses to similar environments across taxa. However, in addition to such barriers, geographic distances and local adaptation to environmental heterogeneity may shape genetic divergence. In their recent Molecular Ecology paper, Myers and colleagues collect genomic data from 13 co-distributed species of snakes from Southwestern North America and evaluate the relative importance of biogeographic barriers, geographic distance, and environmental heterogeneity in structuring genetic divergence. Much of the previous phylogeographic work in this region has focused on divergence across a prominent biogeographic barrier: the Cochise Filter Barrier (CFB), which separates the Sonoran and Chihuahuan Deserts, and divergence across this barrier has been suggested to be an important factor driving divergence in snakes from the region. Though they expected to find a prominent role of this barrier, instead, Myers and colleagues find strong support for geographic distance and environmental heterogeneity as important factors structuring genetic divergence, but less support for biogeographic barriers. Further, they find that different variables contribute most to divergence across the 13 taxa studied, highlighting the importance of species-specific responses to environmental variation. Read the full article here, and read below for a behind-the-scenes interview with lead author Edward Myers.

What led to your interest in this topic / what was the motivation for this study? 
As a research team we have a general interest in what factors are promoting population genetic differentiation and whether codistributed species have similar evolutionary histories in response to shared environmental changes over time. Specifically, in this system where there is a well known biogeographic barrier (Cochise Filter Barrier; CFB), we were interested in whether entire assemblages of taxa show similar population structure. Initially the motivation for this study was to assess the degree of co-divergence across the CFB, however, as we analyzed these data it became clear that we needed to incorporate spatial and environmental data to understand population divergence. This study also allowed me spend a significant amount of time in the field collecting tissue samples from snakes!

What difficulties did you run into along the way? 
One of the biggest difficulties with this study was handling and analyzing all the generated data. We had almost 400 samples sequenced for radseq, so processing and analyzing these data took a significant amount of computational time. Also, one difficulty was the logistics of collecting fresh tissue samples for all of these species across the southwestern US and northern Mexico, but issues like this are easily over come by collaborating.

What is the biggest or most surprising finding from this study? 
The biggest surprise from this study is that patterns of isolation-by-distance and isolation-by-environment are more important in explaining population genetic differentiation than a commonly cited biogeographic barrier. This result really stresses the importance of incorporating spatial analyses when analyzing phylogeographic data because aspatial analyses may result in spurious results of population structure and mislead our ideas of what is driving population divergence and speciation.

Moving forward, what are the next steps for this research? 
Moving forward I plan to generate whole genome sequence data for species within this system to understand what loci may be under selection in response to environmental heterogeneity. Given the strong signature of IBE I expect to find patterns of strong selection along transects of temperature and precipitation across the Sonoran and Chihuahuan Deserts. Further, I am interested in how other regions globally that have been cited as important biogeographic barriers in phylogeographic studies might also be strongly influenced by patterns of IBD and IBE, and not vicariant barriers.

What would your message be for students about to start their first research projects in this topic? 
There is so much great work published in the field of landscape genetics and comparative phylogeography and I would suggest that students start by combing through that work first. But as general advice I would suggest that students really explore their data in a meaningful way and spend some time thinking about what factors could be responsible for similar patterns observed in a genomic data set (e.g., IBD vs vicariance or selection vs historical demography).

What have you learned about science over the course of this project? 
I have really learned that genomic data should be carefully analyzed as to not be influenced by preconceived ideas of the system that you might be working within. Also, I think that this is becoming more and more true, but you have to collaborate in order to do great science.

Describe the significance of this research for the general scientific community in one sentence.
This work demonstrates that codistributed species do not have shared evolutionary histories, and that they do not respond to the same landscape and shared environment in similar ways.

Describe the significance of this research for your scientific community in one sentence.
Our work shows that simple patterns of isolation-by-distance and isolation-by-environment have contributed to population genetic differentiation more so than commonly cited biogeographic barriers.

Full article: Myers EA, Xue AT, Gehara M, et al.Environmental heterogeneity and not vicariant biogeographic barriers generate community‐wide population structure in desert‐adapted snakes. Mol Ecol. 2019;28:4535–4548. https://doi.org/10.1111/mec.15182

Interview with the authors: Linking plant genes to insect communities: Identifying the genetic bases of plant traits and community composition

Much research in community genetics attempts to understand how genetic variation influences community composition, but the majority of studies have been done at the level of the genotype. In their new Molecular Ecology paper, Barker and colleagues use genome-wide association mapping in aspen (Populus tremuloides) to identify specific genes that may influence variation in tree traits or in insect communities. They uncover 49 SNPs that are significantly associated with tree traits or insect community composition. Notably, insects with closer associations with host plants have more genetic correlations than less closely associated insects. Barker and colleagues find a SNP associated with insect community diversity and the abundance of interacting species, providing a link between genetic variation in aspen and insect community composition. Finally, they find that tree traits explain some of the significant relationships between SNPs and insect community composition, suggesting a mechanism by which these genes may influence community composition. Read the full article here, and get a behind-the-scenes interview with lead author Hilary Barker below.

What led to your interest in this topic / what was the motivation for this study? 
For some time, we have been interested in extended phenotypes – the idea that the genes of an organism not only shape the immediate traits of that organism, but also extensions of these traits, such as the community of insects living on a tree. Yet, until our study, most of the previous research had been largely focused on differences across genotypes of ‘host’ organisms (e.g., aspen, cottonwoods, evening primrose), rather the underlying genes. Thus, there were a lot of unknowns yet to be discovered. For instance, would the genetic effects be large enough to detect and identify? Would more underlying tree genes be found for insects that are more closely associated with the tree (i.e., leaf gallers) rather than free feeding insects? Would there be an overlap between genes associated with insect communities and genes associated with particular tree traits? 

What difficulties did you run into along the way? 
I think the largest challenge of conducting a Genome-Wide Association study on a common garden of trees is the planting and maintenance of this small forest. We had 1824 trees that needed planting, phenotyping, and care. This work was most intense in the first four years of the study to ensure that each tree survived a summer drought and multiple harsh winters. The next most challenging hurdle was conducting the insect surveys. These surveys involved a large team effort and happened during some of the hottest days of the summer. 

What is the biggest or most surprising finding from this study? 
The most exciting finding from this study was the identification of an aspen gene (early nodulin-like [ENODL] transmembrane protein, Potra001060g09097) that underlies insect community composition; both diversity and the abundance of key insect species (aphids and ants). While we do not yet know the mechanism by which this gene influences insect communities, we do know that this protein is involved in the transportation of carbohydrates. Thus, it’s possible that this gene directly influences aphids and ants via their interactions with carbohydrate-rich honeydew, and/or indirectly influence insects via numerous tree traits, including both growth (size) and defense. To our knowledge, this is the first identification of allelic variation in a plant gene that is associated with a complex insect community trait (i.e., insect community composition).

Moving forward, what are the next steps for this research? 
The next step of this research is to explore how the genetic underpinnings of these aspen traits and associated insect communities may vary across different environmental gradients and with tree ontogeny. Previous research has shown that aspen growth and defense traits vary with tree age, and these traits play a significant role in determining which insects will feed upon the foliage. Thus, the genetic contributions of insect community composition may vary substantially for more mature trees. The Lindroth lab is currently working on an expanded version of this study with more detailed traits and mature (reproductive) trees. In addition, gene expression will vary with different environmental conditions, which will likely also modify which genes are most important in shaping insect communities.

What would your message be for students about to start their first research projects in this topic? 
To complete a large community genomics study such as this, you will need a few key things. First, you will need a lot of help. Start recruiting anyone and everyone in sight. Mentoring undergraduates will be essential and ensuring that you can effectively asses the learning of your mentees and volunteers is critical (e.g., can they correctly identify X insect? Can they successfully complete X protocol in the lab?). Second, get organized. Project management platforms can be really helpful (e.g., Asana, MS Teams, etc.) to keep track of tasks. Third, refine your R markdown scripts. You will generate more data than you know what to do with, and thus creating R scripts to clean up, organize, and analyze your data will be a top priority. Also, if you can get a digital microscope (e.g., Dino Lite), then the tedious task of keying out insect specimens will be much easier and less cumbersome! I highly recommend it.

What have you learned about science over the course of this project? 
In terms of a genome-wide association study, it is best to have as large a sample size as possible (more genotypes and genetic variation). You do not want to invest a lot of resources into a study that has low statistical power for association testing. Also, phenotype as many traits as you can. At the onset, it is impossible to know which genes, if any, will be associated with which traits. Thus, you could end up with a lot of investment while identifying a small number of associated genes, or potentially no genes at all. 

Describe the significance of this research for the general scientific community in one sentence.
Our findings show that specific genes in a host organism can shape the composition of associated communities.

Describe the significance of this research for your scientific community in one sentence.
Complex extended phenotypes such as community composition have an identifiable genetic basis, and thus we can use this information to test and study the extent and limitations of community evolution.

Full article: Barker HL, Riehl JF, Bernhardsson C, et al. Linking plant genes to insect communities: Identifying the genetic bases of plant traits and community composition. Mol Ecol. 2019;28:4404–4421. https://doi.org/10.1111/mec.15158