DeepSeek unveils mHC method to cut costs and scale large language models
Chinese AI firm DeepSeek has published a paper on arXiv describing Manifold-Constrained Hyper-Connections (mHC), a training architecture the company says could let engineers build and scale large language models without the huge computational costs normally required.
The mHC approach builds on hyper-connections (HCs), introduced in 2024, which give neural network layers more channels to share information but also increase memory use and risk signal degradation. DeepSeek’s paper — whose authors include CEO Liang Wenfeng — proposes constraining those connections to preserve information complexity while reducing memory costs.
The research addresses a core challenge for LLMs: signals can attenuate or turn into noise as they pass through many layers. DeepSeek frames mHC as a way to better balance a model’s plasticity and stability so signals survive across deeper networks.
DeepSeek drew attention last year with R1, a model said to rival top offerings at lower training cost. The company has delayed the planned release of its follow-up model, R2, which observers say could use the mHC framework; no new release date has been announced.
By limiting the memory overhead of hyperconnectivity while keeping its informational benefits, mHC could make highly complex models more practical for smaller, less well-funded developers and may influence how future frontier models are built.
Key Topics
AI, United States, Tech, Deepseek, Llms, Model Training, Hyperconnections