'We Explore Areas Where Much Remains Unknown'
Computational methods for analysing ancient and modern genomes make it possible to study the formation of genetic diversity in populations, reconstruct their history of mixing and migration, and trace the development of environmental adaptations. The HSE International Laboratory of Statistical and Computational Genomics applies mathematical methods and genetic data to address a wide range of issues across fields such as anthropology, epidemiology, and criminology. The HSE News Service spoke with the laboratory head, Vladimir Shchur, about its work.
— When was the laboratory established?
— In the summer of 2019, shortly after my return from UC Berkeley, HSE University announced a competition to establish international laboratories. Together with Rasmus Nielsen, my academic supervisor at UC Berkeley and one of the leading experts in population genomics and statistics, we decided to take part in the competition, and our project was approved. We began our work in early 2020, with Rasmus serving as the laboratory’s scientific director. Unfortunately, due to the pandemic and the global situation, he has not yet been able to visit Moscow in person. Nevertheless, we continue to collaborate actively and work on joint projects.
— What are the main activities of the laboratory?
— Our initial plan was to focus on human population and evolutionary genomics, developing new mathematical methods for analysing genetic data to study human genomic history. This is basic research. However, when the pandemic began, we shifted part of our efforts to studying the genomic epidemiology of the SARS-CoV-2 virus, including its rate of spread and the directions of its transmission based on viral genomes. By now, a strong team specialising in viral and bacterial evolution has formed in the laboratory. In addition, our research interests extend to applications of population genomics in medicine, criminology, and agriculture.
— Among the laboratory’s key research topics are population evolution modelling and the assessment of population size and migration activity. What are you focusing on in particular?
— Essentially, we are trying to reconstruct the past. There are different timescales we can examine—from millions of years ago to the most recent ten generations. For example, we study the consequences of archaic introgression: the mixing of Neanderthals, a parallel branch of ancient humans that became extinct about 40,000 years ago, with the ancestors of modern humans. This is particularly interesting because all people of non-African descent carry 1–2% (and sometimes more) of Neanderthal genome. Moreover, some populations in Asia and Oceania also have genetic components inherited from Denisovans (named after the Denisova Cave in the Altai Mountains, Siberia, where the first remains were found—ed.).
In our laboratory, we have developed a set of methods for studying archaic introgression. For example, these methods increase the accuracy and reduce the effort required to classify archaic genomic sites as either Neanderthal or Denisovan. However, the most fascinating application is reconstructing the complex history of the origin of Neanderthal segments in modern populations, eg among Mexicans. Our approach makes it possible to account simultaneously for both archaic introgression and modern admixture—the formation of populations on the American continent in the post-Columbian period.
— What were you able to find out?
— Mexicans have roughly half of their genome of European origin, about 45% from the indigenous population, and around 5% from Africa. Admixture among these groups has continued for approximately 20 generations. We were the first in the world not only to identify Neanderthal-derived sites in modern Mexican genomes, but also to predict whether these sites were inherited through European or indigenous ancestry.

Why does this matter? When Europeans moved to the Americas and began mixing with the local population, the Neanderthal-derived elements in their genomes entered a new environment. Over a relatively short historical period, we can observe the action of natural selection on these specific genetic components. For example, if 5% of Europeans carry Neanderthal ancestry in a particular region of the genome, and Mexicans show an increase to 20% in the same region, this provides strong evidence that it may have an evolutionary advantage for Mexicans. While we cannot determine the exact biological causes and mechanisms, we can suggest to experimentalists that this region is worth investigating for potential functional significance.
— How does natural selection shape genetic diversity? Which environmental factors have the greatest influence on genes?
— A well-known example of environmental adaptation is a variant of the EPAS1 gene in Tibetans. Inherited from Denisovans, this gene enables Tibetans to thrive at high altitudes. Another example is dietary adaptation. Europeans, for instance, developed the ability to digest lactose, while Nordic populations adapted to high-fat diets in extreme environments. In Oceania, a diving community has an enlarged spleen, enabling its members to hold their breath for extended periods and spend extremely long times underwater.
— What did you discover in your research on COVID-19 and other infections?
— Together with Georgii Bazykin from Skoltech, who studies pathogen evolution, we published a paper on the early spread of COVID-19 in Russia. We found that most of the virus introductions came from Europe and coincided with Russians traveling during the holidays on February 23 and March 8.
We have 211 coronavirus genomes at our disposal, collected in Russia between March and April 2020. According to our estimates, these genomes resulted from more than 60 separate introductions. Interestingly, only one of these originated from China, the virus’s birthplace. In contrast, clusters introduced from Europe established themselves and began to spread actively across the country.
An interesting case involves the Delta variant of COVID-19, which was introduced into Russia in the summer of 2021. One might expect it to have been imported multiple times, but it turned out that nearly all cases traced back to a single introduction. The 'founder effect' was at work—one particularly 'lucky' individual brought the virus to Russia, and within just two weeks, 90% of cases originated from that single carrier arriving in Moscow.
Following our work on COVID-19, genomic epidemiology has become a major focus of our research. We study the genetics of pathogens that cause tuberculosis, influenza, hepatitis, and encephalitis.
We could say that we study how virus populations adapt to humans. For example, we investigated the long-term course of COVID-19 in an individual with immunodeficiency, who harboured an entire viral population capable of mutating and evolving over time.
For detailed studies, we need databases containing both viral genomes and human genomes, particularly those of people exhibiting symptoms of pathogen-induced diseases.
— How actively are HSE University’s students and doctoral students involved in the laboratory?
— We have six experienced senior researchers, while the remaining 15 team members—junior research fellows and research assistants—are students, including doctoral students. Indeed, even bachelor’s students in our laboratory have published in top-tier A-list journals. We treat students as full-fledged colleagues and assign them real research tasks. Students from the Faculty of Computer Science focus on projects emphasising mathematical modelling and programming, while students from the Faculty of Biology work on processing experimental data. It all depends on their field of study and individual interests. All of our research is interdisciplinary: even in a mathematical modelling project, a student will eventually encounter data analysis, and a biology student will likely work with statistics. The main thing is for a student to work diligently, and then their term paper can evolve into a publishable article in a reputable journal.
— Are the results of your work applied in the educational process?
— Since September, we have launched a new track, 'Data Analysis for Life Sciences,' within the Applied Mathematics and Information Science programme. This track focuses on evolution, combining evolutionary biology with probabilistic modelling and data analysis.

— Tell us about the conference in China attended by several members of your laboratory. Where was it held, and what was its theme?
— The SMBE Conference, organised by the international Society for Molecular Biology and Evolution, is the largest event in this field.
This year, 1,200 scientists participated in the conference, which was held in Beijing during the summer. It was a fantastic opportunity to reconnect with international colleagues we hadn’t seen in a long time. We confirmed that our research is at the forefront of the global scientific community. The conference generated many questions, feedback, and interest in applying our methods. Our laboratory presented three contributions: two oral presentations by Anfisa Popova and myself, and one poster by Galina Klink.
— Can we say that participating in conferences allows Russian scientists to maintain regular interaction with their foreign colleagues?
— For me, definitely yes. It provides an opportunity to communicate personally with many colleagues, discuss challenges that have slowed down research and publications, and identify new avenues for collaboration.
— How do you see the prospects of your work?
— On one hand, we are aware of the current challenges, as we are expected to focus on applied and commercialised developments. On the other hand, our team consists of researchers dedicated to fundamental science and convinced of its value. The nature of basic research is that we explore areas where much remains unknown. We cannot predict in advance what we will discover or when the knowledge we gain will translate into in-demand technology. At the same time, our scientific interests and expertise are also linked to socially significant challenges, such as studying drug resistance in Mycobacterium tuberculosis and genomic epidemiology. Our plans are therefore to maintain a high level of basic research while simultaneously advancing applied areas.