In this exciting research, Jack Sullivan and Megan Smith and colleagues use machine learning techniques to create a powerful predictive framework for phylogeographic studies. Learn about their experiences building this novel research approach!
What led to your interest in this topic / what was the motivation for this study?
We’ve been interested in the question of whether or not we can predict phylogeographic patterns for some time. Initially, we attempted to predict whether or not unstudied species harbored cryptic diversity using climate and taxonomic information (https://royalsocietypublishing.org/doi/pdf/10.1098/rspb.2016.1529). We used taxa that were known to either harbor or lack cryptic diversity to train a Random Forest classifier, and then made predictions about unstudied taxa. We found that we could predict the presence or absence of cryptic diversity (with low error rates when based on cross-validation!) We also saw that taxonomy was a powerful predictor of cryptic diversity, and we began to wonder why. In this study, we evaluate whether life history traits can explain this result.
What difficulties did you run into along the way?
When trying to use life history traits to make predictions across taxonomic levels, the most difficult problem is finding appropriate traits. Many traits, while likely very informative for specific taxa, are difficult to score across taxonomic groups. Our dataset included mammals, plants, arthropods, gastropods, amphibians, and birds. The biggest difficulty was finding life history traits that we could score across all of these groups and that we hypothesized would be meaningful predictors of phylogeographic patterns.
What is the biggest or most surprising finding from this study?
Life history traits are great predictors of phylogeographic patterns. In one of the systems we studied, these traits can even replace taxonomy as a predictor, suggesting that taxonomy was serving as a proxy for these traits. We find that traits related to reproduction (e.g. reproductive mode, clutch size) and trophic level are particularly informative in our predictive framework.
Moving forward, what are the next steps for this research?
There is a wealth of data on phylogeographic patterns available, but most studies have focused on one or a few species. The framework developed in Espíndola et al. (2016) and expanded upon here provides a mechanism for integrating these studies into a predictive framework. As data continue to become available, our approach will allow policymakers and scientists alike to make predictions about what patterns are expected in unstudied species. Further, this approach can provide insight into which life history traits drive differences in species responses to historic events, and this may allow us to begin to understand why species respond to similar events in idiosyncratic ways.
What would your message be for students about to start their first research projects in this topic?
Think early and often about how your work can be integrated into the field in a broader way. Particularly as molecular data become easier to collect, more and more single species studies accumulate. By looking at these studies in a new light and integrating across studies, we can learn a lot about communities and overarching patterns.
What have you learned about science over the course of this project?
Over the course of this project, I’ve learned to look at data in many different ways. Our initial work on this topic suggested that taxonomy was the most important predictor of phylogeographic patterns. While true, this told us little about the biology of the taxa we were studying. By delving deeper and adding life history traits to our study we were able to draw biologically meaningful conclusions about why species responded differently to geologic and climatic events.
Describe the significance of this research for the general scientific community in one sentence.
By using machine learning, we can integrate genomic, ecological, and trait data to make predictions about how species have responded to historic events, and to understand which factors lead to idiosyncratic responses.
Describe the significance of this research for your scientific community in one sentence.
Using publicly available data and machine learning techniques, we can make predictions about phylogeographic patterns across broad taxonomic groups, and we can draw conclusions about how life history traits influence these patterns.