Interview with the authors: Transposable elements mark a repeat-rich region associated with migratory phenotypes of willow warblers

In a recent paper in Molecular Ecology, Caballero-López and colleagues investigated the genetics of migratory behaviour in a two subspecies of willow warbler (Phylloscopus trochilus trochilus and Phylloscopus trochilus acredula). Previous work had identified several genetic markers associated with migratory behaviour in this species, but a particularly important candidate marker was unable to be mapped to previous genome assemblies. This suggested to Caballero-López et al, that the important marker may lie in a highly repetitive, and thus difficult to assemble, genomic region. Leveraging a recent genome assembly based on long-read technology and a quantitative PCR approach, Caballero-López et al found that the elusive migration marker is located in a genomic region rich in remnants of transposable elements.

We sent some questions to the primary author of this work, Violeta Caballero-López, to get some more insight and details about this exciting study.

Willow warbler male, 2017. Photo credit: Harald Ris.

What led to your interest in this topic / what was the motivation for this study? 

My research aims to shed some light on our understanding of the genetics underpinning bird migration, which is currently very poor. Passerine birds migrate alone, and they follow the same routes to wintering grounds as their parents, fully relying on genetic mechanisms.

The motivation for this specific study was to try to characterize a region in the genome which varies between two subspecies of willow warbler that present differential migration to Africa. Until now, this region was only identified as an AFLP-derived marker which failed to be mapped to the genome. However, with the use of molecular techniques such as qPCR in combination with a good quality genome assembly, we could understand the nature of this element better.

AFLP: Amplified Fragment Length Polymorphism

qPCR: Quantitative Polymerase Chain Reaction

Can you describe the significance of this research for the general scientific community in one sentence?

Repeat-rich regions which are often considered “junk DNA” might have a larger role on phenotypes and function than previously thought.


Can you describe the significance of this research for your scientific community in one sentence?

It is important to revise the role of repeat DNA on the determination of a complex trait such as the determination of bird migratory routes.

Willow warbler singing in Siberia, 2017. Photo credit: Harald Ris.

What difficulties did you run into along the way? 

For more than 20 years “WW2” has been an elusive AFLP marker, observed to be fixed in the “northern” subspecies P. t. acredula. It could only be amplified in PCR as a 154 bp fragment and then sequenced, but its nature was totally unknown. The identification and curation of this sequence as a transposon (TE) was challenging because it is an old, degraded “LTR portion” of the full element. This required a willow warbler genome built with long read sequencing techniques that provided regions of the genome rich in repeat DNA. Locating the ends of this transposon was also complicated. Alignment “breaks” serve as a detection method for the target site duplications that mark the edge of these elements. However, they could not be used in our system because these TEs appear consistently embedded within a larger block of repeats. This interfered with our estimation of age and theories about the origin of the repeat.

LTR: Long terminal repeat

What is the biggest or most surprising innovation highlighted in this study? 

The most surprising finding here is the presence of a large repeat-rich region (>12 mb) that segregates in both willow warbler subspecies. This region is characterized by several copies of the WW2 derived variant, which turned out to be part of a transposable element belonging to the endogenous retrovirus family. Furthermore, we provide solid evidence of its independence from the other polymorphic regions in chromosomes 1 and 5. As this TE seems to be inactive, and no clear functional genes have been detected on its surroundings, it remains puzzling why this region correlates with migration in the willow warbler so strongly.

You end your paper describing how it’s premature to think that the association of the WW2 derived variant has a causal role on the trait. Based on your knowledge of the warbler genome, would you care to speculate as to the actual causal basis of the phenotype?

The most supported hypothesis is that migration is a complex trait influenced by gene packages. In the case of the willow warblers, I would speculate that the repeat rich region, and not necessarily the WW2 derived variant itself, could affect migration indirectly through 1) the formation of a structural variant in a chromosome that affects gene expression 2) the trans-regulation from this region of some gene(s) elsewhere in the genome 3) the presence of an adjacent gene outside this region that we have not been able to detect in the current genome assemblies so far 4) a missed single copy gene within the repeat rich region. However, the last one is the least likely given that areas with such a repeat density rarely contain functional genes.

Have you got any ideas of how you might test the hypothesis that chromosomal rearrangements were facilitated by the presence of TEs

The most exciting possibility is to visually confirm if these rearrangements have taken place. A way to test this empirically would be to obtain a karyotype of each subspecies and combine it with fluorescence in situ hybridization (FISH). First, a probe labelling the WW2 derived variants would signal the location of the repeat-rich region. Once the location of this region is resolved, it is possible to design several fluorescent probes outside of it to determine if the chromosomal arrangement around it is maintained both in the genome of P. t. acredula and its orthologue region in P. t. trochilus.

Moving forward, what are the next steps in this area of research?

The biggest mystery within this study is the location in the genome of this repeat-rich region that contains several copies of the WW2 derived variant. One of the biggest challenges of genome assemblies is the mapping and correct location of repeat-dense sequences, and therefore future effort should be focused on targeting empirical evidence of the location of this region. Then we could get a better hint on if and/or how this region affects migration. Is it downstream or upstream of any gene complex? Is it silenced? how does its orthologue look in P. t. trochilus?

Typical working setup for the willow warbler team, 2021. Photo credit: Harald Ris.

Caballero-López, V., Lundberg, M., Sokolovskis, K., & Bensch, S. (2022). Transposable elements mark a repeat-rich region associated with migratory phenotypes of willow warblers (Phylloscopus trochilus). Molecular Ecology, 31, 1128– 1141.

Interview with the authors: How genomic data reveal cryptic species and how migration patterns maintain genetic divergence in birds?

In a recent paper in Molecular Ecology, Tang et al. investigated genetic divergence of different subspecies of pale sand martin (Riparia diluta) using genome-wide data. They found that the subspecies in Central and East Asia, which vary only gradually in morphology, broadly represent three genetically differentiated lineages. No signs of gene flow were detected between two lineages that met at the eastern edge of the Qinghai-Tibetan Plateau, which is likely due to largely different breeding and migration timing. Limited mixed ancestries were found in Mongolian populations between two lineages that might take divided migration routes around the Qinghai-Tibetan Plateau, and the authors hypothesize that selection against hybrids with nonoptimal migration routes might restrict gene flow. See the full article for more details of the study and the interview with lead author Manuel Schweizer below for more stories behind this exciting work.

Pale sand martin Riparia diluta tibetana, Mongolia, June 2018. Photo Credit: Manuel Schweizer

What led to your interest in this topic / what was the motivation for this study? I studied pale sand martin in Central Asia as part of the work on a field guide to the birds of Central Asia, which was published in 2012. I was then fascinated by the fact that the different subspecies described for this species breed in completely different environments: Central Asian steppes and semi deserts, high altitude grasslands on the Qinghai-Tibetan Plateau, or lowland subtropical China. Although it was evident that morphological identification of single individuals of the different subspecies without context is not possible, I suspected that cryptic diversity might be involved in this complex. This was corroborated by mtDNA data that we published in 2018. Together with Gerald Heckel and our PhD student Qindong Tang, I wanted to investigate this further using genome-wide data and test if gene flow is reduced in areas of potential contact between evolutionary lineages.

Breeding site of pale sand martin on the east edge of Qinghai-Tibetan Plateau in Zoige (Sichuan Province, China). Photo Credit: Qindong Tang

What difficulties did you run into along the way? The biggest challenge was to get a comprehensive geographic sampling together. As pale sand martins breed in low densities only, this meant a lot of travelling. Fortunately, we could count on the great support of our collaboration partners and their network – Yang Liu from Sun Yat-sen University in Guangzhou, China, and Gombobaatar Sundev from the National University of Mongolia. Moreover, Qindong Tang made an incredible effort and did an excellent job during the fieldwork. 

What is the biggest or most surprising innovation highlighted in this study? Given the absence of obvious sexually selected traits and only gradual morphological differentiation between the different evolutionary lineages of the pale sand martin, the level of genetic differences and the fact that they behave like different species at least at the eastern edge of the Qinghai-Tibetan Plateau is indeed surprising. So, we were left with the following question: what processes and mechanism prevent a complete mixing at secondary contact zones? We think that seasonal migration behavior might be an essential factor in maintaining genetic integrity of these morphologically cryptic evolutionary lineages.

Moving forward, what are the next steps in this area of research? The next step is evident: we need to study in detail migration behavior of the different lineages. The ranges of two of them meet in the area of a well-known avian migratory divide, where western lineages take a western migration route around the Qinghai-Tibetan Plateau to winter quarters in South Asia, and eastern lineages take an eastern route to Southeast Asia. This might also be the case in the pale sand martins and we hypothesize that hybrids might have nonoptimal intermediate migration routes and selection against them might restrict gene flow. This will need quite some field work and application of up-date technologies such as modern data loggers. Let’s hope that the development of the pandemic will allow field work again soon.

What would your message be for students about to start developing or using novel techniques in Molecular Ecology? It is best to get started and not be intimidated or even afraid. The easiest way to learn new methods is to start using them. It is also important to build a network of people who can be asked for support when problems arise.

What have you learned about methods and resource development over the course of this project? As always in studies with a phylogeographic background, sampling matters most. Try to organize a complete geographic sampling in the beginning of a project. Sampling in parts of the distribution area of our study system was planned in the third year of Qindong’s PhD, however, this could not be achieved due to the pandemic. As a consequence, we still lack samples from western Mongolia which would have been important and made the overall picture more comprehensive. This work did not include any development of new methods, however, a knowledge of state-of-the art methodological approaches is obviously always crucial.

Describe the significance of this research for the general scientific community in one sentence. Our study points towards contrasting migration behavior as an important factor in maintaining evolutionary diversity under morphological stasis.

Describe the significance of this research for your scientific community in one sentence. Our discovery of cryptic diversity in the pale sand martin indicates that evolutionary diversity might be underestimated even in such well-studied groups such as birds, and it suggests that it is worth having a closer look at widespread species occurring in different environments.

Photo of the first author Qindong Tang during the field work. Photo Credit: Qin Huang
Sampling team in Qinghai, PR China, June 2016. From left to right: Manuel Schweizer, Paul Walser Schwyzer, Yang Liu, Qin Huang, Yun Li and our driver. Photo Credit: Manuel Schweizer
Sampling team in Mongolia, June 2018. From left to right: Tuvshin Unenbat, Turmunbaatar Damba,  Gombobaatar Sundev, Paul Walser Schwyzer, Manuel Schweizer, Silvia Zumbach, Sarangua Bayrgerel. Photo Credit: Manuel Schweizer

Tang Q, Burri R, Liu Y, Suh A, Sundev G, Heckel G, Schweizer M. 2022. Seasonal migration patterns and the maintenance of evolutionary diversity in a cryptic bird radiation. Molecular Ecology. https://doi.org/10.1111/mec.16241.