DeepSeek unveils mHC method to cut costs and scale large language models

DeepSeek unveils mHC method to cut costs and scale large language models — Zdnet.com
Image source: Zdnet.com

Chinese AI firm DeepSeek has published a paper on arXiv describing Manifold-Constrained Hyper-Connections (mHC), a training architecture the company says could let engineers build and scale large language models without the huge computational costs normally required.

The mHC approach builds on hyper-connections (HCs), introduced in 2024, which give neural network layers more channels to share information but also increase memory use and risk signal degradation. DeepSeek’s paper — whose authors include CEO Liang Wenfeng — proposes constraining those connections to preserve information complexity while reducing memory costs.

The research addresses a core challenge for LLMs: signals can attenuate or turn into noise as they pass through many layers. DeepSeek frames mHC as a way to better balance a model’s plasticity and stability so signals survive across deeper networks.

DeepSeek drew attention last year with R1, a model said to rival top offerings at lower training cost. The company has delayed the planned release of its follow-up model, R2, which observers say could use the mHC framework; no new release date has been announced.

By limiting the memory overhead of hyperconnectivity while keeping its informational benefits, mHC could make highly complex models more practical for smaller, less well-funded developers and may influence how future frontier models are built.


Key Topics

AI, United States, Tech, Deepseek, Llms, Model Training, Hyperconnections