Written by Beatriz Mello and Sudhir Kumar
The work presents the most extensive evaluation to date of relaxed-clock methods’ performance to infer molecular times for datasets that contain a mixture of population and species divergences. Such datasets are commonly used in phylogeography, phylodynamics, and species delimitation studies. A wide range of biological scenarios was explored, which allowed us to compare and contrast the accuracies and precisions of divergence times for a Bayesian (BEAST) and a non-Bayesian (RelTime in MEGA) method. Results showed that both RelTime and BEAST generally perform well and that RelTime presents a reliable and computationally efficient alternative to speed up molecular dating.
Read the full text here.
What led to your interest in this topic / what was the motivation for this study?
Our interest in this topic was driven by a major dilemma faced by researchers when analyzing data containing molecular sequences from closely related individuals and individuals from distinct species. This is because the Bayesian framework requires a tree prior to model the inference of divergence times. There is a myriad of tree priors available, but most importantly, they either model divergence between species or intra-species divergences. Thus, the adopted tree prior will be suboptimal to describe the evolutionary process for datasets with mixed sampling. So, our question was, although misspecified, would the use of the same tree prior produce good time estimates? Also, no one has previously examined how well non-Bayesian methods perform for such datasets, as they do not require specification of priors.
What difficulties did you run into along the way?
One of the major difficulties we faced was the computational burden of Bayesian analysis. We all know that molecular dating using Bayesian methods can be time-consuming. However, they can become onerous in computer simulation studies because many datasets need to be analyzed. Each Bayesian analysis took several hours to complete, and we had to conduct thousands of Bayesian analyses. This was not an issue with the RelTime method, which finished computing in minutes.
What is the biggest or most surprising innovation highlighted in this study?
Our biggest finding is that, although the tree prior will frequently be an erroneous description of biological evolution, the accuracy of time estimates is not greatly impacted for most choices of the tree prior. This is good news to researchers working with phylogenies containing a mix of population and species. On top of that, RelTime is much faster than the Bayesian approach and produces similar results. This finding is important since the amount of sequence data is increasingly growing. A fast and accurate method allows hypotheses testing to be done using different assumptions and data subsets, improving the scientific rigor and reproducibility by others.
Moving forward, what are the next steps in this area of research?
For Bayesian methods, it will be useful to develop faster approaches. However, the excellent performance of the RelTime approach that does not require prior specification is very encouraging. Evolutionary simulations employing even more diverse biological conditions and tree topologies, especially involving many sequences, will be a very useful next step, which may only be feasible with RelTime and other fast methods.
What would your message be for students about to start developing or using novel techniques in Molecular Ecology?
Our main message for students is to realize that no method is almighty. For those aspiring to develop new methods, it is our first step to apply different methods to a diversity of datasets and examine how the results differ, why they differ, and whether we can solve the problem discovered. It is again important for those applying new methods to use different methods and scrutinize differences in results. It is not a good idea to assume that a popular protocol is better than others by default; we need to keep an open mind and make decisions with evidence.
What have you learned about methods and resources development over the course of this project?
All of us learned quite a lot about the multispecies coalescent approach by analyzing simulated data because we know the correct result. The lesson was that some methods require many assumptions and that sometimes even small changes can have a big impact, resulting in distinct evolutionary inferences. So, we need to be very careful and explore a wide range of biological assumptions. Also, there is a strong need for more realistic simulation studies.
Describe the significance of this research for the general scientific community in one sentence.
Researchers will now be able to decide which methods and approaches to apply in their particular dataset using results from this study.
Describe the significance of this research for your scientific community in one sentence.
The accuracy and precision of divergence time estimation for datasets that contain both intra- and interspecies molecular sequences is tested for slow (Bayesian) and fast (RelTime) molecular dating approaches.
Mello B, Tao Q, Barba-Montoya J, Kumar S. Molecular dating for phylogenies containing a mix of populations and species by using Bayesian and RelTime approaches. Mol Ecol Resour. 2021;21:122–136. https://doi.org/10.1111/1755-0998.13249