Large Language Models No Longer Require Powerful Servers

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.
This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.
Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.
Challenges in implementing LLMs
The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.
The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.
The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.
Key details of the new method
The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.
The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.
The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.
Response from the academic community, and other methods
The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.
Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.
See also:
Scientists Discover Why Parents May Favour One Child Over Another
An international team that included Prof. Marina Butovskaya from HSE University studied how willing parents are to care for a child depending on the child’s resemblance to them. The researchers found that similarity to the mother or father affects the level of care provided by parents and grandparents differently. Moreover, this relationship varies across Russia, Brazil, and the United States, reflecting deep cultural differences in family structures in these countries. The study's findings have been published in Social Evolution & History.
When a Virus Steps on a Mine: Ancient Mechanism of Infected Cell Self-Destruction Discovered
When a virus enters a cell, it disrupts the cell’s normal functions. It was previously believed that the cell's protective response to the virus triggered cellular self-destruction. However, a study involving bioinformatics researchers at HSE University has revealed a different mechanism: the cell does not react to the virus itself but to its own transcripts, which become abnormally long. The study has been published in Nature.
Researchers Identify Link between Bilingualism and Cognitive Efficiency
An international team of researchers, including scholars from HSE University, has discovered that knowledge of a foreign language can improve memory performance and increase automaticity when solving complex tasks. The higher a person’s language proficiency, the stronger the effect. The results have been published in the journal Brain and Cognition.
Artificial Intelligence Transforms Employment in Russian Companies
Russian enterprises rank among the world’s top ten leaders in AI adoption. In 2023, nearly one-third of domestic companies reported using artificial intelligence. According to a new study by Larisa Smirnykh, Professor at the HSE Faculty of Economic Sciences, the impact of digitalisation on employment is uneven: while the introduction of AI in small and large enterprises led to a reduction in the number of employees, in medium-sized companies, on the contrary, it contributed to job growth. The article has been published in Voprosy Ekonomiki.
Lost Signal: How Solar Activity Silenced Earth's Radiation
Researchers from HSE University and the Space Research Institute of the Russian Academy of Sciences analysed seven years of data from the ERG (Arase) satellite and, for the first time, provided a detailed description of a new type of radio emission from near-Earth space—the hectometric continuum, first discovered in 2017. The researchers found that this radiation appears a few hours after sunset and disappears one to three hours after sunrise. It was most frequently observed during the summer months and less often in spring and autumn. However, by mid-2022, when the Sun entered a phase of increased activity, the radiation had completely vanished—though the scientists believe the signal may reappear in the future. The study has been published in the Journal of Geophysical Research: Space Physics.
Banking Crises Drive Biodiversity Loss
Economists from HSE University, MGIMO University, and Bocconi University have found that financial crises have a significant negative impact on biodiversity and the environment. This relationship appears to be bi-directional: as global biodiversity declines, the likelihood of new crises increases. The study examines the status of populations encompassing thousands of species worldwide over the past 50 years. The article has been published in Economics Letters, an international journal.
Scientists Discover That the Brain Responds to Others’ Actions as if They Were Its Own
When we watch someone move their finger, our brain doesn’t remain passive. Research conducted by scientists from HSE University and Lausanne University Hospital shows that observing movement activates the motor cortex as if we were performing the action ourselves—while simultaneously ‘silencing’ unnecessary muscles. The findings were published in Scientific Reports.
Russian Scientists Investigate Age-Related Differences in Brain Damage Volume Following Childhood Stroke
A team of Russian scientists and clinicians, including Sofya Kulikova from HSE University in Perm, compared the extent and characteristics of brain damage in children who experienced a stroke either within the first four weeks of life or before the age of two. The researchers found that the younger the child, the more extensive the brain damage—particularly in the frontal and parietal lobes, which are responsible for movement, language, and thinking. The study, published in Neuroscience and Behavioral Physiology, provides insights into how age can influence the nature and extent of brain lesions and lays the groundwork for developing personalised rehabilitation programmes for children who experience a stroke early in life.
Scientists Test Asymmetry Between Matter and Antimatter
An international team, including scientists from HSE University, has collected and analysed data from dozens of experiments on charm mixing—the process in which an unstable charm meson oscillates between its particle and antiparticle states. These oscillations were observed only four times per thousand decays, fully consistent with the predictions of the Standard Model. This indicates that no signs of new physics have yet been detected in these processes, and if unknown particles do exist, they are likely too heavy to be observed with current equipment. The paper has been published in Physical Review D.
HSE Scientists Reveal What Drives Public Trust in Science
Researchers at HSE ISSEK have analysed the level of trust in scientific knowledge in Russian society and the factors shaping attitudes and perceptions. It was found that trust in science depends more on everyday experience, social expectations, and the perceived promises of science than on objective knowledge. The article has been published in Universe of Russia.


