Nominations are now open for the Harry Smith Prize 2021

The editorial board recently established a new prize that recognizes the best paper published in Molecular Ecology or Molecular Ecology Resources by early career scholars in the last year by graduate students or early career scholars with no more than five years of postdoctoral or fellowship experience. The prize is named after Professor Harry Smith FRS, who founded Molecular Ecology and served as both Chief and Managing Editor during the journal’s critical early years. He continued as the journal’s Managing Editor until 2008, and he went out of his way to encourage early career scholars. In addition to his editorial work, Harry was one of the world’s foremost researchers in photomorphogenesis, leading to concepts such as “neighbour detection” and “shade avoidance,” which are essential to understanding plant responses to crowding and competition. His research provided an early example of how molecular data could inform ecology, and in 2008 he was awarded the Molecular Ecology Prize that recognized both his scientific and editorial contributions to the field. As with the Molecular Ecology Prize, the winner of this annual prize is selected by an independent award committee, but the Harry Smith Prize comes with a 1,000 USD cash award, an announcement in the journal and on social media, as well as an invitation to join the Molecular Ecology Junior Editorial Board. Please send a short supporting statement (no more than 250 words; longer submissions will not be accepted) and PDF of the paper you are nominating to Dr. Alison Nazareno ( or Dr. Katrina West ( by Friday 31 May 2021. Self-nominations are accepted. 

Summary from the authors: Contaminations contaminate common databases

Molecular barcoding of bird malaria and related parasites has unravelled a remarkable diversity of potentially cryptic species that may count in tens of thousands compared to the few hundred morphologically described species. The database MalAvi (Bensch et al., 2009) was initiated to structure the growing numbers of findings of these bird blood parasites. The polymerase chain reaction (PCR) is irrefutably a powerful method to detect and identify pathogens, however the high sensitivity of the method comes with a cost; any of the millions of artificial DNA copies generated by PCR can serve as a template in a following experiment. If such PCR-contaminations go undetected, it will result in erroneous findings of parasites and thus misrepresent their distribution.  We address this problem by re-analysing samples of surprising records in the MalAvi database, these being unusual host species or geographic locations for the parasites. Our analyses suggest that many of these are PCR contaminations, presumably originating from previous or parallel projects in the laboratory. The highlighted examples are from bird parasites, but the problem of contaminations, and the suggested actions to reduce such errors, should apply generally to all kinds of studies using PCR for identification.

Read the full text here.

Fig 1. The database MalAvi ( presently contains >4,400 unique mitochondrial lineages of avian malaria parasites obtained from >2,000 species of birds.

References Bensch, S., Hellgren, O. & Pérez-Tris, J. MalAvi: 2009. A public database of malaria parasites and related haemosporidians in avian hosts based on mitochondrial cytochrome b lineages. Molecular Ecology Resources, 9: 1353-1358.

Interview with the authors: Molecular dating for phylogenies containing a mix of populations and species by using Bayesian and RelTime approaches

Written by Beatriz Mello and Sudhir Kumar

The work presents the most extensive evaluation to date of relaxed-clock methods’ performance to infer molecular times for datasets that contain a mixture of population and species divergences. Such datasets are commonly used in phylogeography, phylodynamics, and species delimitation studies. A wide range of biological scenarios was explored, which allowed us to compare and contrast the accuracies and precisions of divergence times for a Bayesian (BEAST) and a non-Bayesian (RelTime in MEGA)  method. Results showed that both RelTime and BEAST generally perform well and that RelTime presents a reliable and computationally efficient alternative to speed up molecular dating.

Read the full text here.

Lead author Beatriz Mello.

What led to your interest in this topic / what was the motivation for this study?

Our interest in this topic was driven by a major dilemma faced by researchers when analyzing data containing molecular sequences from closely related individuals and individuals from distinct species. This is because the Bayesian framework requires a tree prior to model the inference of divergence times. There is a myriad of tree priors available, but most importantly, they either model divergence between species or intra-species divergences. Thus, the adopted tree prior will be suboptimal to describe the evolutionary process for datasets with mixed sampling. So, our question was, although misspecified, would the use of the same tree prior produce good time estimates? Also, no one has previously examined how well non-Bayesian methods perform for such datasets, as they do not require specification of priors.

What difficulties did you run into along the way? 

One of the major difficulties we faced was the computational burden of Bayesian analysis. We all know that molecular dating using Bayesian methods can be time-consuming. However, they can become onerous in computer simulation studies because many datasets need to be analyzed. Each Bayesian analysis took several hours to complete, and we had to conduct thousands of Bayesian analyses. This was not an issue with the RelTime method, which finished computing in minutes. 

What is the biggest or most surprising innovation highlighted in this study? 

Our biggest finding is that, although the tree prior will frequently be an erroneous description of biological evolution, the accuracy of time estimates is not greatly impacted for most choices of the tree prior. This is good news to researchers working with phylogenies containing a mix of population and species. On top of that, RelTime is much faster than the Bayesian approach and produces similar results. This finding is important since the amount of sequence data is increasingly growing. A fast and accurate method allows hypotheses testing to be done using different assumptions and data subsets, improving the scientific rigor and reproducibility by others.

Moving forward, what are the next steps in this area of research?

For Bayesian methods, it will be useful to develop faster approaches. However, the excellent performance of the RelTime approach that does not require prior specification is very encouraging. Evolutionary simulations employing even more diverse biological conditions and tree topologies, especially involving many sequences, will be a very useful next step, which may only be feasible with RelTime and other fast methods.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Our main message for students is to realize that no method is almighty. For those aspiring to develop new methods, it is our first step to apply different methods to a diversity of datasets and examine how the results differ, why they differ, and whether we can solve the problem discovered. It is again important for those applying new methods to use different methods and scrutinize differences in results. It is not a good idea to assume that a popular protocol is better than others by default; we need to keep an open mind and make decisions with evidence.

What have you learned about methods and resources development over the course of this project?

All of us learned quite a lot about the multispecies coalescent approach by analyzing simulated data because we know the correct result. The lesson was that some methods require many assumptions and that sometimes even small changes can have a big impact, resulting in distinct evolutionary inferences. So, we need to be very careful and explore a wide range of biological assumptions. Also, there is a strong need for more realistic simulation studies.

Describe the significance of this research for the general scientific community in one sentence.

Researchers will now be able to decide which methods and approaches to apply in their particular dataset using results from this study.

Describe the significance of this research for your scientific community in one sentence.

The accuracy and precision of divergence time estimation for datasets that contain both intra- and interspecies molecular sequences is tested for slow (Bayesian) and fast (RelTime) molecular dating approaches.


Mello B, Tao Q, Barba-Montoya J, Kumar S. Molecular dating for phylogenies containing a mix of populations and species by using Bayesian and RelTime approaches. Mol Ecol Resour. 2021;21:122–136.

Summary from the authors: Detecting selected haplotype blocks in evolve and resequence experiments

How organisms adapt to changes in the environment is not only a central question of evolutionary biology but also relevant to the threat of recent global warming. Evolution experiments in controlled laboratory settings (Experimental Evolution) are a great tool for evaluating evolutionary processes. When combined with genome sequencing (Evolve and Resequence), genomic changes related to adaptation can be identified. Although these genomic changes can occur in large parts of a chromosome (selected haplotype block), most approaches focus only on single genomic sites, and in consequence might overestimate the signal of evolution. Here, we present a novel method for detecting such selected haplotype blocks in evolve and resequence experiments. Our approach requires only few input parameters and is based on the grouping of neighboring genomic sites and on a comparison of different chromosomes. Analyzing computer simulations and experimental data, we describe distinct haplotype block patterns related to the number of genomic sites under selection and to the speed of adaptation. Our results indicate that the analysis of selected haplotype blocks has indeed the potential to deepen our understanding of adaptation.

Read the full text here.

Figure 1: Left: Flies are a powerful model organism to study temperature adaptation from standing genetic variation in evolve and resequence experiments (modified from Mallard et al., 2018). Right: Selected haplotype blocks (blue) spanning large parts of a chromosome are present in the majority of individuals after 60 generations of experimental evolution.


Otte KA, Schlötterer C. Detecting selected haplotype blocks in evolve and resequence experiments. Mol Ecol Resour. 2021;21:93–109.

Mallard, François, et al. A simple genetic basis of adaptation to a novel thermal environment results in complex metabolic rewiring in Drosophila. Genome biology 2018:19.1: 1-15.

Interview with the authors: Museum epigenomics: Characterizing cytosine methylation in historic museum specimens

Recent work has shown that it may be possible to characterize epigenetic markers from museum specimens, suggesting yet another potential contribution of collections-based research. In their recent Molecular Ecology Resources paper, Rubi et al. used ddRAD and bisulphite treatment to characterize cytosine methylation in deer mice (Peromyscus spp.). They characterized methylation in specimens from 1940, 2003, and 2013-2016. While they were able to characterize patterns in all specimens, older specimens had reduced methylation estimates, less data, and more interindividual variation in data yield than did new specimens. Rubi et al. demonstrate the promise of museum epigenetics while highlighting technical challenges that researchers should consider. Read the interview with lead author Dr. Tricia Rubi below to get a behind-the-scenes look at the research behind the paper.

Read the full paper here.

Peromyscus maniculatus skull collected in 2002 and housed in the University of Michigan Museum of Zoology collection. Photo Credit: Dr. Tricia Rubi

What led to your interest in this topic / what was the motivation for this study? 

When I wrote the original proposal for this work, the earliest papers had just been published in the field of ancient epigenomics (epigenetic studies using paleontological or archaeological specimens). My proposal centered around museum specimens, and I realized that no work had been done looking at epigenetic effects in more recent historic specimens (decades to centuries old), which comprise the bulk of museum collections. The recent field of museum genomics has already opened up a range of new directions for research using collections; I believe that museum epigenomics could be a similar frontier in collections-based research. In particular, epigenomic studies using museum collections could allow us to characterize change over time, which may help clarify the role of epigenetic effects in ecological and evolutionary processes.

What difficulties did you run into along the way? 

As is the case when developing any novel protocol, we encountered a variety of challenges and dead ends. However, we found that the main challenge for DNA methylation work using museum specimens was actually the same as the main challenge for regular genetic work using museum specimens: recovering usable amounts of DNA in the initial DNA extraction. DNA quantity and quality seemed to be a better predictor of success than specimen age; our oldest specimens (~76 years old) with higher DNA concentrations yielded a similar amount of methylation data relative to much “younger” specimens. The upside is that this challenge is already a familiar one to researchers conducting museum genomics work. Our data suggests that historic DNA samples that have been successfully used for genomic analyses are probably also well suited for methylation analyses.

What is the biggest or most surprising innovation highlighted in this study? 

I think the main takeaway from this study is that DNA methylation analyses using historic collections is feasible, even for lower quality specimens such as traditional bone preparations that are several decades old. Our oldest specimens in this study were dried skulls collected in 1940; while those specimens showed considerable variation in the amount of recoverable DNA, the specimens that yielded higher DNA concentrations performed well in our analyses.

Moving forward, what are the next steps in this area of research?

There is plenty of work to be done! In this paper we highlight future directions for both developing methodology and applying museum epigenomics to ecological and evolutionary questions. Increasing the number of sequenced methylation markers or refining protocols for targeted sequencing are some obvious first steps in improving methods. Museum epigenomics approaches could be used to tackle a variety of questions in ecological and evolutionary epigenomics. In particular, epigenomic studies using museum specimens could be used to infer gene expression in past populations, or to directly measure how epigenetic markers change over time. 

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

Developing or refining novel techniques is an important and potentially rewarding process, but it requires enormous patience, as well as correctly managed expectations about the outcomes of the work. Researchers should be prepared for slower progress and a higher failure rate. Even when protocols do work, it may be more difficult to test broader ecological hypotheses due to unforeseen problems or non-optimal results. However, the upside is that projects using novel approaches can provide an important contribution to the field regardless of the specific outcomes of the work. My advice would be to design projects with several contingency plans to ensure that publishable data can be produced, and to factor in extra time for troubleshooting each step of the novel protocols.

Describe the significance of this research for the general scientific community in one sentence.

Natural history specimens retain patterns of in vivo DNA methylation, the best studied epigenetic marker; museum epigenomics may be the next frontier in collections-based research.


Rubi TL, Knowles LL, Dantzer B. Museum epigenomics: Characterizing cytosine methylation in historic museum specimens. Mol Ecol Resour. 2020;20:1161–1170.

Associate Editor vacancies

Molecular Ecology and Molecular Ecology Resources are looking for new Editorial Board members to join the journals as Associate Editors in the key subject areas below:

  • Eco-immunology/emerging diseases/disease resistance
  • Proteomics/protein evolution
  • Computer programs/statistical approaches
  • Environmental DNA/metabarcoding

Experience with genome assemblies would also be advantageous.  

Nominations and applications are welcome and whilst scientific qualifications are paramount, we would particularly appreciate nominations and applications from suitably qualified researchers from underrepresented groups (including women, ethnic minority scientists, scientists with disabilities and other underrepresented groups). Please email nominations/applications by October 15th, 2020 to with the following items:

  • Cover letter stating the reasons for your nomination, of if applying for yourself, your interest in the role and familiarity with the journals,
  • Abbreviated CV (Education, Publications, Outreach) if you have it.

The story behind the Special Feature: Genomics of natural history collections

We are really excited to get a sneak peak into the story behind a new Special Feature in Molecular Ecology Resources focusing on the use of genomic techniques to better understand natural history collections. In this Special Feature, the authors led by Assistant Professor Lua Lopez, compiled a broad range of studies using a variety of methods to illustrate the enormous potential of museum samples to answer question fundamental to molecular ecology. See below for a video interview with Lua and the article. Check out the great set of articles in the special feature here.

  1. What led you to put together a special issue on this topic?
Lua Lopez. Assistant Professor at California State University

My first contact with ancient genomics was during my postdoc at PSU at the Lasky Lab. As soon as I started looking for literature to help me get the project started I realized that, except in the field of human ancient genomics, information was scattered and it was not easy to find methodological papers for wet-lab or bioinformatics of this type of data. We were lacking a strong foundation of studies using a combination of ancient, historical and modern samples stored in museums. Because of all this I wanted to put together an issue compiling a critical mass of studies using Natural History Collections (NCH) to advance the field of evolutionary biology. Although I had been thinking for a while about this, I only adventured to put this together when two new postdocs also working with NHC samples joined the lab, Dr. Kathryn Turner and Dr. Emilly Bellis. The three of us, together with our postdocsupervisor Jesse Lasky, decided it was time to get this running and I am very excited with the result.

2. Of the papers in the special feature, can you identify any broad trends?

All papers provide a significant advance in important methodological steps (from DNA extraction to data analysis) facilitating the use of NHC sample in evolutionary studies. The data used to test the methods in these papers provide a glimpse of the new research avenues that NHC samples can open.

3. What did you find the most surprising about the papers in this feature?

It was incredible to see how many fields can benefit from using NHC samples. This issue does not only cover methodological aspects but it shows how NHC samples can help answer long-standing questions in the fields of metagenomics, epigenetics, conservation genomics, evolutionary ecology and phylogenetics.

4. What do you recommend to researchers trying to collect genomic data from natural history collections?

Contact as many NHCs as you can. There are still many collections that are not digitized and being aware of what is available that can have a large impact in your experimental design. If this is your first time working with NHC samples, team up, genetic studies with NCH samples can be a big challenge (high risk, high reward). Having someone with experience to guide you is going to be one of the best things you can do to ensure the success of your research.

5. What do you think are crucial next research steps to effectively utilizing natural history collections?

I strongly believe that the next steps include digitizing NHC collections and archiving DNA data. Many  NHC samples are not yet digitized and researchers looking a particular species can only obtain a partial picture of what’s available for their studies. The accuracy we have to answer particular questions is, in most cases, determined by the samples we have access to (i.e. number of samples, geographical and temporal distribution).  In addition, any genetic data obtained from NHC samples should be publicly available. By having access to larger data sets we can not only increase the accuracy of our results but we can also better predict future scenarios.

6. What (if any) method advances are needed?

In the past 10-20 years, we have improved enormously in our wet lab protocols and bioinfomatics but the intrinsic nature of DNA from NHC samples means that we still have a long way to go. Ideally, we want standardize protocols for large taxonomic groups and identify what kind of factors have a larger impact in DNA damage. This also applies for pipelines for data analysis, in general the more standardized protocols are the best, it will hep us comparing results among studies and trying to identify broad evolutionary patters.

7. What would your message be for students about to start their first research projects on this topic?

Understand what kind of samples you have in your hands. It’s not only about how old they are, it’s also about how where they preserve after sampling and during storage, where are they coming from, how much material you have, etc. Many factors are going to influence the success of obtaining DNA of enough quality for downstream analysis. And the same goes for the data analysis, make sure you are considering the particular nature of the genetic data that you are analyzing. NHC samples are precious and destructive sampling cannot be done lightly. So, always do a test run and ask all the questions that you have.

Interview with the authors: the genomic basis of adaptation in an invasive sea squirt

In this interview, Professor Bo Dong tells us about his team’s recent study exploring the genomic basis of environmental adaptation in the leathery sea squirt (Styela clava), a highly invasive species of tunicate that has adapted to a broad range of environments. In this study, the authors assembled a chromosomal-level genome and transcriptome of the leathery sea squirt and undertook in situ hybridization and drug inhibition experiments in order to elucidate molecular mechanisms of adaptation. Continue reading to find out what the team found and why it matters, and click here to read the article.

Styela clava, the leathery sea squirt. Photograph by Xiang Li, an author of the study

What led to your interest in this topic / what was the motivation for this study? 

Our lab works on organ morphogenesis and developmental genomics using an ascidian model. When we collected animals at the sea in Qingdao, China, we found many leathery sea squirts. Previous research has found that the leathery sea squirt is invasive across the globe, and impacts on both marine biodiversity and aquaculture industries. Therefore, we were interested in revealing the genomic basis of its adaptation. In addition, the Wellcome Sanger Institute, in celebration of its 25th anniversary, created a poll of species where the winners would have their genomes decoded. The leathery sea squirt was included in the ‘Dangerous Zone’ category of the poll, and although it did not win this strengthened our determination to decode its genome.

The leathery sea squirt was an option in the vote for the Wellcome Sanger Institute’s ’25 Genomes for 25 Years’.

What difficulties did you run into along the way? 

In order to obtain a better genome assembly, we used the PacBio sequencing and combined this with Hi-C approach. Because of the small size of leathery sea squirt adults, we tried many times to get enough high-quality DNA from one individual for library construction. In addition, the approaches for functional analysis is fairly limited in this ascidian species. We tried different ways to do dechorionation or microinject the DNA into the eggs, but it was not working well. We are continuing our work on this now.

What is the biggest or most surprising finding from this study? 

Compared with the classical ascidian model species Ciona robusta, we found that Styela clava has a genome double the size but with comparable gene number. Another intriguing finding is that cold-shock protein genes were transferred horizontally into the S. clava genome from bacteria. Transfer of these genes provides one of the possible molecular mechanisms for S. clava to adapt the environmental stress, particularly low-temperature stress.

Moving forward, what are the next steps for this research? 

We obtained the genetic information and molecular network of environmental adaptation and metamorphosis of leathery sea squirts through high quality genome assembly. Next, we are focusing on two further aspects of this project: 1) we are further digging into the signaling molecules that control the larval metamorphosis experimentally and 2) we plan to reveal the mechanisms for gene transfer from bacteria to ascidians.

What would your message be for students about to start their first research projects in this topic? 

First, you should know clearly what kinds of scientific questions you want to ask by genome assembly approaches. Second, try to discuss your research projects with scientists with different backgrounds to adjust your research strategies and analyze your results. Third, compare your genome data with the data from other species to see if your conclusion is a universal one. 

What have you learned about science over the course of this project? 

Animals are so smart. They use different and unexpected strategies to adapt to environmental stress. Genomic approaches are a powerful way to elucidate the biological mechanisms of adaptation. Experimental results are often different from your expectations.

Describe the significance of this research for the general scientific community in one sentence.

The present study provides a chromosomal-level genome for understanding environmental adaptation in invasive tunicates.

Describe the significance of this research for your scientific community in one sentence.

Our study provides the chromosomal-level genome resources of leathery sea squirt (S. clava) and a comprehensive genomic basis for understanding environmental adaptation and larval metamorphosis.


Wei, Jiankai, et al. “Genomic basis of environmental adaptation in the leathery sea squirt (Styela clava).” Molecular Ecology Resources (2020).

Summary from the authors: A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues

We aimed to sequence and compare all the DNA (eg., the genome) of a bunch of different deer mice (genus Peromyscus) species to understand how some deer mice survive in hot deserts with little to no water. A number of deer mice tissue samples were available through natural history museums, which house the raw materials for genetic and biodiversity investigations, but the samples had been collected many years earlier. Older samples produce lower quality DNA that has been broken into many pieces over time. Our normal sequencing procedure selectively removes small fragments of DNA, which would essentially throw away all the DNA we wanted to sequence for these older samples! To circumvent this, we were able to use a different DNA library preparation method called linked-read sequencing (LRS). LRS uses standard short-read sequencing technology, but adds additional information about the location of DNA fragments within the genome by bundling and barcoding DNA fragments that are located near each other prior to sequencing (eg., ‘links’ DNA fragments together in ‘genome-space’). We found that this method improves the overall quality and completeness of genome assemblies from historical tissue samples, in less time and with less effort than traditional shot-gun sequencing methods. This alternative method may be particularly valuable for building high-quality genome assemblies for extinct species for which there are no new samples being collected for or endangered species that are difficult to sample or collect. LRS adds to the suite of genomic methods that continue to unlock the secrets of natural history collections and enable fine-scale genetic measurement of change through time.

This summary was written by the study’s first author, Jocelyn Colella.

Read the full text here.

Video credit: Jocelyn Colella. Peromyscus in the field.

Full Text: Colella JP, Tigano A, MacManes MD. A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues. Mol Ecol Resour. 2020;20:871–881.

Interview with the authors: Evaluation of model fit of inferred admixture proportions

Admixture models are widely-used in population genetics, but they make several simplifying assumptions, which, if violated, could result in misleading estimates of individual ancestry proportions. In a recent paper published in Molecular Ecology Resources, Garcia-Erill and Albrechtsen introduce evalAdmix, a program for detecting poor fit of admixture models to empirical data. evalAdmix uses the correlations of the residual differences between true and predicted genotypes to detect poor fit; when the assumptions of the model are not violated, the residuals of a pair of individuals should be uncorrelated. In simulation studies and analyses of empirical datasets, evalAdmix was useful in identifying model violations due to gene flow from unsampled ghost populations, continuous variation, population bottlenecks, and an incorrect assumed number of ancestral populations. Read the full article here, and read below for an exclusive interview with lead author Genís Garcia-Erill.

Full text: Garcia-Erill G. and Albrechtsen A. Evaluation of model fit of inferred admixture proportions. Mol Ecol Resour. 2020;20:936–949.

Admixture model and evaluation with our method applied to worldwide human genetic variation. A. Admixture proportions inferred with ADMIXTURE assuming K=5 for all human populations from the 1000Genomes project. B. Evaluation of admixture model with the correlation of residuals performed with evalAdmix. Positive correlations are indicative of a bad model fit. The correlation of residuals shows that modelling with an ancestral population for each of the 5 major continental groups leads to a bad fit within most populations, and furthermore it gives additional information. For example we can see that the populations more genetically distant from the rest with which they are grouped, like Luhya in Webuye, Kenya (LWK) or Finish in Finland (FIN), have higher correlations of residuals, or it indicates the presence of substructure in some populations like the Gujarati Indians in Houston, TX (GIH).

What led to your interest in this topic / what was the motivation for this study?

The admixture model is one of the most used methods in population genetics, but it has already been known for some time that there are many potential issues with it. Specifically a recent study described very nicely different scenarios that can lead to wrong conclusions when applying the admixture model (Lawson et al. 2018). For example, they showed how multiple scenarios can lead to the same admixture results, and they also presented a method, badMixture, that can distinguish between those scenarios and evaluate model fit. However badMixture is quite difficult to apply, so we thought it would be interesting to develop an alternative method that could help in guiding the interpretation of admixture model results.

What difficulties did you run into along the way?

My background is in Biology and I had limited experience in computer science and statistics when I started with this project, so most of the difficulties were related to my learning how to work in these two disciplines. The method itself was relatively straightforward, but in order for it to work properly we needed to find a way to correct the bias caused by the frequency estimation. The frequency correction is only a small part of the main article, but it was where we put most of the work during the development of the method; that ended up as a few pages full of equations in the supplementary material. Another aspect where I had to put considerable effort was in making the implementation, since again I did not have much experience in developing software that would (hopefully) be used by other people. That made me consider things I would not usually think about.

What is the biggest or most surprising innovation highlighted in this study?

I think the method itself is the main result of the study. As I said there is already a method to evaluate the admixture model fit, badMixture. However that method is rarely used, because it requires performing additional analyses with CHROMOPAINTER and also requires having data with good enough quality to at least call genotypes. The method we present is more generally accessible since it is based on information unique to the admixture model itself, meaning one can directly apply it to any data set to which the admixture model has been applied. So it provides what we think is a simple way, both in the application and in the interpretation, to evaluate the admixture model results.

Moving forward, what are the next steps in this area of research?

There are several directions in which this work could be expanded. Something we already spent some time on is trying to develop a more firm theoretical foundation for the correlation of residuals as a measure of model fit, for example expressing it in terms of individual-specific Fst and the distance between the populations from which they are sampled, in a framework similar to that in Ochoa and Storey (2018). In the end we could not figure out the math and left it as a short mention in the discussion, but that would be something very nice to do. We also could not find a good way to use the residuals to develop some sort of measure of model fit at a purely individual level (instead of depending on the relationship between pairs of individuals, as it does right now), and that would also be very nice to do. Moreover, individual frequencies can also be calculated using principal component analyses, so this method could be expanded to work as an evaluation of a PCA as a description of population structure. Finally what we are looking forward to the most is to see how the method is applied to different datasets and how that helps gain new scientific insights.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? 

I am myself a student who has very recently started developing and using novel techniques in Molecular Ecology, so I am not sure if I have enough experience and perspective to give any useful advice. But based on my limited experience, I would say that it is important not to be afraid to jump into new areas or fields where we feel like we might have too limited experience, and that often what at first seems very difficult will become more and more accessible and doable as we work on it.

What have you learned about methods and resources development over the course of this project?

I started working on this study during my Master studies, so it has been one of my first research experiences. Basically all I know about method development I learned during the course of this project, from the more practical skills related to developing and implementing a method to how to explain it, and make it accessible to the community that might be interested in using it. I realized that this can actually be very important, since it will affect how many people end up using it. Also, as a user of bioinformatics methods I really appreciate when I use a new method if it is easy to use and does not create too many problems.

Describe the significance of this research for the general scientific community in one sentence.

It is important to consider the assumptions of the methods we use, since relevant violations of the assumptions might result in misleading or even meaningless results.    

Describe the significance of this research for your scientific community in one sentence.

It makes it possible and easy to evaluate the model fit of the admixture model at the individual-level in almost any context in which the admixture model is currently used, so it can be applied before concluding a population is a mixture of others, or it can help to choose a meaningful number of ancestral populations.


Lawson, D. J., Van Dorp L., and Falush, D.. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nature Communications 2018;9.1: 1-11.

Ochoa, A. and Storey, J. D. FST and kinship for arbitrary population structures I: Generalized definitions. BioRxiv 2016: 083915.