Interview with the authors: Applying genomic data in wildlife monitoring

Massive parallel sequencing has led to an explosion of sequence data in recent years. However, the methods used to obtain such data are usually high-cost and time-intensive, and often require high-quality samples. This creates limits as to whether and how well such data can be used by researchers working in applied conservation science. Here, we speak to Alina von Thaden about her recent study in Molecular Ecology Resources. Using European wildcats as a case study, Alina and co-authors present a relatively low-cost and time-efficient workflow for the development and optimisation of microfluidic SNP panels, which can be used to obtain SNP data from minimally invasive samples. Beyond outlining the workflow and its applications, they go so far as to estimate the costs of their pipeline, providing valuable practical information for conservation scientists. Read on for an in-depth view of this study.

Monitoring elusive European wildcats (Felis silvestris) is heavily reliant on noninvasively collected DNA samples. Photo credit: Annsophie Schmidt.

What led to your interest in this topic / what was the motivation for this study? 

We are mainly working on genetic monitoring of large carnivores and most of our research is based on noninvasively collected wildlife samples such as hairs, faeces and saliva traces. The field demands for very fast and reliable genetic analyses of samples with degraded DNA. And since funding is generally sparse in applied conservation, our methods need to be cost-effective and suitable for high-throughput approaches.

Genomic tools, on the other hand, usually involve large amounts of data, complex bioinformatic pipelines and typically rely on samples with high-quality DNA. We have been looking into ways to combine the advantages of genomics with the challenges of conservation monitoring. For some years now, we have been working with microfluidic arrays combined with reduced SNP panels and wanted to share our experiences with other labs interested in applying them.

What difficulties did you run into along the way? 

Setting-up and optimizing methodological resources comes along with several challenges – but there is a lot to learn! Most important to me was to remain skeptical about the results and to constantly validate them through analyzing the data from several perspectives and with different software. The validation of the technology also took a lot of extra lab hours, but we are confident that the workflow and guidelines that we present now will save others a lot of hands-on time and costs when optimizing SNP panels for degraded samples.

What is the biggest or most surprising finding from this study? 

First of all, after years of developing the framework, we applied it to a new SNP panel designed for dog-wolf hybridization assessment (to be published) and found that the lab work for generating a new ready-to-use marker panel took us only a few weeks. To see the approach being proved effective was great and encouraged us to share it with the community.

Secondly, a large proportion of noninvasively collected samples could be run without or with only very few genotyping errors as compared to more traditional microsatellite-based genotyping (see also von Thaden et al. 2017). This has direct implications for genotyping costs and thus promotes the broader establishment of a genomic technology in applied conservation.

Alina von Thaden collecting reference samples of European wildcat (Felis silvestris) for testing a newly developed SNP panel. Photo credit: Annsophie Schmidt.

Moving forward, what are the next steps for this research? 

One of our next steps is to apply the technology to historical samples from museum collections. Additionally, we are going to implement the SNP panel from our current paper in routine genetic monitoring of European wildcats in Germany.

We currently develop other reduced SNP panels for a variety of endangered species in our lab, such as dormice and European bison. Besides neutral variation, we also aim to integrate functional markers, such as SNPs associated with disease susceptibility.

Further, we will test alternative platforms that will allow generating larger SNP sets for degraded samples. Ultimately, our long-term goal is the effective implementation of an “applied genomic wildlife monitoring” approach.

What would your message be for students about to start their first research projects in this topic? 

Get in contact with other groups working in this area! Sharing ideas and experience really helps to shape your project and refine the aims of your research. Most people are very cooperative and happy to contribute or answer questions.

What have you learned about science over the course of this project? 

Perseverance and tenacity. When exploring new directions the research journey may well become bumpy and lead you somewhere else than you initially expected. But it’s worth it – keep your goal in mind and be ready to rethink your strategy.

Describe the significance of this research for the general scientific community in one sentence.

Bridging the gap between genomics and applied conservation is a key prerequisite for effective wildlife management, especially in the light of rapid biodiversity declines.

Describe the significance of this research for your scientific community in one sentence.

We demonstrate how reduced SNP panels can be efficiently developed and optimized for genotyping based on degraded wildlife samples.

References

von Thaden, A., Cocchiararo, B., Jarausch, A., Jüngling, H., Karamanlidis, A. A., Tiesmeyer, A. … Muñoz-Fuentes, V. (2017). Assessing SNP genotyping of noninvasively collected wildlife samples using microfluidic arrays. Scientific Reports, 7, 83. https://doi.org/10.1038/s41598-017-10647-w

Full paper

von Thaden, A., Nowak, C., Tiesmeyer, A., Reiners, T. E., Alves, P. C., Lyons, L. A., … & Hegyeli, Z. (2020). Applying genomic data in wildlife monitoring: Development guidelines for genotyping degraded samples with reduced single nucleotide polymorphism (SNP) panels. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.13136

.

Summary from the authors: A metagenomic assessment of microbial eukaryotic diversity in the global ocean

Marine microbial eukaryotes are key components of planktonic ecosystems in all ocean biomes. They are, along with cyanobacteria, responsible for nearly half of the global primary production, and play important roles in food-web dynamics as grazers and parasites, carbon export to the deep ocean, and nutrient remineralization. Currently, one of the most common approaches to survey their diversity is sequencing marker genes amplified from genomic DNA extracted from microbial assemblages. However, this approach requires a PCR step, which is known to introduce biases in microbial diversity estimates. One alternative to overcome this issue involves exploiting the taxonomic information contained in metagenomes, which use massive shotgun sequencing of the same DNA extracts with the goal of assessing the putative functions of environmental microbes.

In this study we investigated the potential of metagenomics to provide taxonomic reports of marine microbial eukaryotes. The overall diversity reported by this approach was similar to that obtained by amplicon sequencing, although the latter performed poorly for some taxonomic groups. We then studied the diversity of picoeukaryotes and nanoeukaryotes using 91 metagenomes from surface down to bathypelagic layers in different oceans, unveiling a clear separation of taxonomic groups between size fractions and depth layers.

Overall, this study shows metagenomics as an excellent resource for taxonomic exploration of marine microbial eukaryotes.

Summary of the relevance of main eukaryotic taxonomic groups within two size fractions of marine plankton (picoeukaryotes [0.2-3 µm] and nanoeukaryotes [3-20µm]) and in two different layers of the global ocean (photic [0-200 m] and aphotic [200-4000m]) as seen by metagenomics. The median of the relative abundance was calculated for each taxonomic group with samples from the 4 categories (pico-photic, pico-aphotic, nano-photic, nano-aphotic) and dots represent these median values transformed to a 0-100 scale. Dots are then colored based on the category where the taxonomic group is most relevant.

This summary was written by the study’s first author, Aleix Obiol.

Full article:
Obiol, A., Giner, C. R., Sánchez, P., Duarte, C. M., Acinas, S. G., & Massana, R. (2020). A metagenomic assessment of microbial eukaryotic diversity in the global ocean. Molecular Ecology Resources. https://doi.org/10.1111/1755-0998.13147

Interview with the authors: Which software is best to use for de novo assembly?

Reduced representation sequencing (e.g. RAD and GBS) is becoming ever more popular, but for species which lack a reference genome, little work has been done to assess which software may be best suited to building de novo assemblies from this data. Here, we speak to Melanie LaCava of the University of Wyoming about her recent Molecular Ecology Resources article, which explores the accuracy of de novo assemblies built by various software programs using DNA generated from double-digest libraries. Melanie and her co-authors found highly variable degrees of accuracy of assemblies built by six different software programs, and discuss which programs are best suited to this application. They also highlight the importance of optimising parameter settings within any given software. Read on to get a behind-the-scenes view of this study.

The completeness of assemblies in simulations of unmutated genomes (a, d), in simulations of an equal number of SNPs and indels (b, e), and simulations of 1–5 base pair indels (c, f). Values are reported for five assemblers: CDHIT (green), STACkS (blue), STACkS2 (purple), VelVeT (pink) and VSeARCH (orange). The hue of each color corresponds to the percent match parameter setting used in the assembly. For more information on this figure go to Figure 1’s caption here.

What led to your interest in this topic / what was the motivation for this study? 

This study began as a research project in a graduate-level course on computational biology at the University of Wyoming led by the senior author on the paper, Alex Buerkle. Dr. Buerkle initiated the project and worked with the rest of the coauthors to pursue this de novo assembly software comparison. As reduced representation genotyping-by-sequencing has become more popular, new and repurposed software programs have been applied to each step in the bioinformatics pipeline. When a reference genome is unavailable for a study species, de novo assembly is essential, yet we recognized a gap in the evaluation of software used for this important step.

What difficulties did you run into along the way? 

Technology and software associated with genotyping-by-sequencing and de novo genome assembly are rapidly changing. During the course of our project, some of the software programs we tested were significantly updated, so we chose to rerun our analyses using the new software versions to ensure we were providing up-to-date information in our manuscript.

What is the biggest or most surprising finding from this study? 

We were surprised to find such a substantial difference in performance among these assembly programs. We were especially surprised at the variation in performance among software for our first simulation where no mutations were introduced. In this scenario, we made many identical copies of genome fragments and then performed de novo assembly using each software program. Without any mutations introduced, the job is basically to generate a list of unique sequences – it should be very straightforward. In some cases, however, these genome fragments were broken into shorter sequences and rearranged beyond recognition, leading to incorrect reconstruction of the simple, unmutated data.

Moving forward, what are the next steps for this research?

For our study, we selected a sample of assemblers from peer-reviewed literature that use different assembly algorithms, are freely available, and have updated user resources available online. However, this was not a comprehensive evaluation of all software capable of de novo assembly. Therefore, the evaluation of other programs would be valuable. Additionally, as new software programs are introduced or existing programs are updated, continued efforts to evaluate de novo assembly performance is warranted.

What would your message be for students about to start their first research projects in this topic? 

Reduced representation genotyping-by-sequencing is becoming less expensive and more accessible, making it a viable option for more research projects. While it is exciting to apply these emerging technologies and methods, it is important to recognize that approaches to filter and analyze these large datasets are still in development. Doing your background research to ensure you are applying the best available tools and using the most appropriate methods for your study is essential to doing good research in this field and in any field of research.

What have you learned about science over the course of this project? 

Doing this study has reaffirmed the importance of simulations to test how software works. Testing analyses on simulated data and altering parameters of the simulation or analysis can provide immense insight into how the software works and how variation in real data may affect software performance. Larger simulation projects like our study can provide information that many people can use, but I also find it incredibly helpful to run a simulated dataset through an analysis before analyzing my own data to ensure I understand what the software is doing. Taking advantage of simulated datasets available in vignettes for software is a great tool to get acquainted with the analyses you plan to do.

Describe the significance of this research for the general scientific community in one sentence.

Our study demonstrates the importance of ensuring that software you use is really doing what you think it is supposed to do; and simulations can help evaluate software performance.

Describe the significance of this research for your scientific community in one sentence.

Researchers who need to perform de novo assembly of reduced representation genotyping-by-sequencing data can use our study as a guide for which software to use and the importance of different parameter settings for assembly.

LaCava, M. E., Aikens, E. O., Megna, L. C., Randolph, G., Hubbard, C., & Buerkle, C. A. (2019). Accuracy of de novo assembly of DNA sequences from double‐digest libraries varies substantially among software. Molecular ecology resources. https://doi.org/10.1111/1755-0998.13108

Interview with the authors: utilising GT‐seq for minimally invasive DNA samples

Minimally-invasive sampling is commonly used to obtain samples from rare, elusive or dangerous animals. However, this sampling technique often results in samples that are too low in quality or quantity for successful use with most high-throughput sequencing methods. Using cloacal swabs from the threatened Western Rattlesnake (Crotalus oreganus), Danielle Schmidt and colleagues show that Genotyping-in-Thousands by sequencing (GT-seq) can successfully be used to generate high-throughput sequence data from low-quality, low-quantity samples. We interviewed Danielle Schmidt (first author) and Professor Michael Russello (last author) to find out more about what went on behind-the-scenes of this study.

The Western Rattlesnake (Crotalus oreganus), a threatened species in British Columbia, Canada. Photo credit: Marcus Atkins

What led to your interest in this topic / what was the motivation for this study? 

Conservation genomics has become an increasingly common term in the literature, yet many study systems that involve elusive or at-risk species must rely on minimally- or non-invasive sampling to meet research and management objectives. Although a valuable source of biological material, DNA extracted from minimally- or non-invasive samples is typically of low quantity, poor quality, and contaminated with exogenous DNA, all of which may be incompatible with modern sequencing technologies. Implementing leading-edge genetic and genomic tools to study conservation-related questions has been a long-standing interest in the Russello Lab.

What difficulties did you run into along the way?

Based on earlier work that came out of our lab (Russello et al. 2015 PeerJ), we suspected that employing a non-targeted sequencing approach like RADseq would not be efficient for collecting genotypic data from minimally-invasive samples. Therefore, we decided to test the efficacy of GT-seq (Campbell et al., 2015), as it is a targeted method that could help circumvent the typical issues involved with sequencing and genotyping lower quality DNA. Our biggest challenge was designing a GT-seq SNP panel that minimized ascertainment bias to ensure our downstream estimates of within- and among-population variation would be accurate. Also, given the number of samples and loci we planned to analyze simultaneously, optimizing the workflow for data collection took some time.

Library designs for A) RADseq and B) GT-seq. Included samples selected to facilitate within- and among-method genotype comparisons

What is the biggest or most surprising finding from this study? 

One of the most surprising findings was the exceptionally high genotype consistency between paired blood and cloacal swab samples genotyped with GT-seq, and those blood samples genotyped with both RADseq and GT-seq. We even found that samples with initial concentrations as low as ~0.5 ng/uL successfully amplified, which is promising for future applications of GT-seq with minimally- and non-invasive DNA samples.

Moving forward, what are the next steps for this research? 

We are now exploring the application of GT-seq on a host of species to provide rapid, cost-effective genetic information to support research in molecular ecology and to assist wildlife and fisheries management. We are also testing the performance of this workflow with other non-invasive sample types, including feces and hair. Moving forward, we will be exploring ways of deploying these tools in the field to inform management decisions in real-time.

What would your message be for students about to start their first research projects in this topic?

An important message we would like to convey is to think carefully about potential biases when designing a panel of markers to target, as the composition of your panel must be tailored to your research questions. For example, some applications of GT-seq may seek to intentionally maximize the among-population component of genetic variation in order to identify individuals of unknown origin to a particular fish stock with high confidence. In other cases, as with our study, we wanted a panel that could be used to most accurately reconstruct population structure and connectivity, which we were able to subsequently validate relative to a larger RADseq dataset.

What have you learned about science over the course of this project? 

This project highlighted the benefits of taking a new approach to address a long-standing challenge. In molecular ecology and conservation genetic studies, minimally-invasive sampling is commonly employed as either a required or a preferential approach for obtaining sufficient sample sizes. Yet, it has been recognized since the advent of non-invasive genetic sampling in the 1990’s that issues associated with DNA quality and quantity require careful consideration and extra quality control steps. Today, these considerations also apply to the use of modern DNA sequencing technologies from suboptimal starting material; however, GT-seq provides a versatile approach for overcoming DNA quality issues and providing the population-level data needed to address research and management objectives.

Describe the significance of this research for the general scientific community in one sentence.

Multiplexed, amplicon DNA sequencing, such as that employed in GT-seq, is compatible with the minimally-invasive sampling often required for obtaining population-level data to inform biodiversity conservation.

Describe the significance of this research for your scientific community in one sentence.

GT‐seq offers an effective approach for genotyping minimally-invasive samples, providing accurate and precise estimates of within‐ and among‐population diversity metrics relative to genome-wide approaches such as RAD-seq.

Read the full study here:
Schmidt, Danielle A., et al. “Genotyping‐in‐Thousands by sequencing (GT‐seq) panel development and application to minimally invasive DNA samples to support studies in molecular ecology.” Molecular ecology resources (2020). https://doi.org/10.1111/1755-0998.13090

Summary from the authors: Individualized mating system estimation using genomic data

Mimulus guttatus

Hermaphroditic species of plants and animal can produce a mixture of outcrossed and self-fertilized offspring. Estimating the relative frequency of these two outcomes, i.e. the outcrossing rate, has been a major focus in the evolutionary study of reproductive strategies. Outcrossing rate is also a key parameter for plant breeding and for conservation efforts. This paper generalizes a Bayesian method to estimate outcrossing rate (BORICE) using genomic data. Application of the program to an experimental study of Mimulus guttatus illustrates estimation (10% of progeny were selfed), and also how inference of mating system parameters can set up “downstream” evolutionary studies. In the Mimulus study, these downstream analyses included pollination biology (the genetic composition of pollen changed over the season) and local adaptation (inversion polymorphisms exhibit unique patterns of micro spatial structure within the population).

-Professor John K Kelly, University of Kansas

Full article: Colicchio, J., Monnahan, P. J., Wessinger, C. A., Brown, K., Kern, J. R., & Kelly, J. K. (2020). Individualized mating system estimation using genomic data. Molecular ecology resources. https://doi.org/10.1111/1755-0998.13094

Interview with the authors: quality and quantity of genetic relatedness data affect the analysis of social structure

Understanding the influence of relatedness on fine-scale social interactions within a population is fundamental to understanding the role of kinship in animal societies. In this study, Foroughirad et al provide insight into the quality of Single Nucleotide Polymorphism (SNP) data required to obtain accurate and precise parentage assignments and relatedness coefficients using data from a long‐term behavioural study on bottlenose dolphins with a known partial pedigree. They then go on to explore how the quality of these estimates influence post-hoc analyses exploring the relationship between relatedness and social structure. Again, they provide important practical guidance about the quality of data needed for these types of analyses. This article was published in Molecular Ecology Resources: read the full article here, and read our interview with Vivienne Foroughirad, lead author of the study, below.

An adult female bottlenose dolphin with her six-month old calf. Photo Credit: Vivienne Foroughirad

What led to your interest in this topic / what was the motivation for this study? 

In the broadest sense my research interests concern the evolution of sociality and complex social behaviors such as cooperation. To that end, I was interested in our ability to parcel out contexts in which cooperation occurs between kin versus between unrelated individuals. Non-kin cooperation is rare is animal societies, and a common way to search for examples is to first investigate the link between the strength of social relationships and the genetic relatedness of pairs. Since genotyping-by-sequencing is now cheaper and more accessible than ever, I wanted to explore the effects this increased resolution would have on our power to test the relationship between social structure and relatedness, especially in viscous populations with strong philopatry.

What difficulties did you run into along the way? 

In our case, the greatest challenges centered around maintaining a longitudinal study on a wild marine mammal with a large enough sample size to make answering these types of questions feasible. We were lucky to have over 30 years of data available from the Shark Bay Dolphin Project which allowed us to verify some of the reconstructed pedigree relationships, as well as measure detailed home range usage and social associations. An analytical difficulty we encountered is how to account for the confounding effect of philopatry or limited dispersal on social relationships with kin if you want to distinguish kin discrimination from more passive kin associations that are a byproduct of shared space use. 

What is the biggest or most surprising finding from this study? 

We provided evidence that genotyping-by-sequencing methods could produce more precise relatedness values than typical microsatellite analyses, which isn’t surprising. What was less well-understood was the effect this would have for downstream analyses, such as those testing whether relatedness correlated with social affiliation. We found that even though our study species exhibits strong, life-long affiliative relationships between maternal kin, there were a surprising number of scenarios under which our analysis failed to detect a significant correlation between genetic relatedness and social associations. We also found surprisingly diminishing returns in relatedness resolution with increasing sample size (number of individuals) when small numbers of markers were used. 

Moving forward, what are the next steps for this research? 

Pedigree reconstruction is rapidly improving, especially where there is access to new genetic resources such as chromosome-level assemblies for non-model organisms. Improved kin assignment methods will allow us to investigate the function of these relationships at the level of the individual, which will help us to tease out how both intra- and inter-specific variation in ecology and demography affect social behavior. Within my own study site, I’m using these data to look at the effect of family size on social network position and reproductive success, as well as the demographic conditions that facilitate the formation of non-kin bonds. We’re also working on ways to better discriminate between maternal and paternal kin, which will be important for investigating the mechanisms of kin recognition.

What would your message be for students about to start their first research projects in this topic? 

That this is a great idea! Rapid advances in technology will open up new avenues of inquiry and there is lots of work to be done. Nevertheless, as with any field, you also need to know when to stop and submit. There will always be a new higher coverage genome or updated version of the software you’re using that’s about to be released, but if you keep reanalyzing your data with each advance, you’ll never finish a project. My second piece of advice would be to practice simulating data and analyzing it. Building simulated datasets, tweaking parameters, and testing different software has really deepened my understanding of methodologies- plus you can start before you even get your first sequencing results and be ready with a tested pipeline when you do get results in hand.

What have you learned about science over the course of this project? 

That building a robust, reproducible, and well-documented pipeline for analysis is crucial. It might take a bit more work to set up, but it’s always worth it. I also benefitted a lot from the opportunity to present my work to audiences from different disciplines which helped me keep the big picture in mind since I’m the kind of person that gets easily caught up in minutiae. Biologically, I’m always reminded that there’s so much individual variation that gets masked by conducting analyses at the population level, and that rather than being discounted as noise, that variation could be leveraged to ask really interesting questions about how ecology and demography affect behavior. 

Describe the significance of this research for the general scientific community in one sentence.

The correlation between genetic relatedness and the strength of social relationships can be masked by the limited power of typical published sample sizes.

Describe the significance of this research for your scientific community in one sentence.

We provide practical guidance for how sample sizes and sequencing methods might interact to improve precision of relatedness estimates and their effect on the analysis of social structure, using wild bottlenose dolphins as a case study.

Three juvenile male bottlenose dolphins surface synchronously. Photo Credit: Vivienne Foroughirad

Foroughirad, V., Levengood, A. L., Mann, J., & Frère, C. H. (2019). Quality and quantity of genetic relatedness data affect the analysis of social structure. Molecular Ecology Resources, 1181–1194. https://doi.org/10.1111/1755-0998.