计算机工程与应用 ›› 2025, Vol. 61 ›› Issue (22): 20-35.DOI: 10.3778/j.issn.1002-8331.2502-0161

• 热点与综述 • 上一篇    下一篇

算力约束下混合专家模型计算优化方法:现状及研究进展

问佳琳,李晓军,姚俊萍,辜弘炀   

  1. 火箭军工程大学 作战保障学院,西安 710025
  • 出版日期:2025-11-15 发布日期:2025-11-14

Algorithm Optimization Method for Mixture of Experts Under Computational Power Constraints: Status and Progress

WEN Jialin, LI Xiaojun, YAO Junping, GU Hongyang   

  1. School of Operational Support, Rocket Force University of Engineering, Xi’an 710025, China
  • Online:2025-11-15 Published:2025-11-14

摘要: 大语言模型近年来在自然语言处理等领域取得了显著成果,混合专家模型通过稀疏激活的策略减少了大语言模型的计算需求。随着混合专家模型所面临的推理任务越发复杂,部署于终端设备上的专家模型常常面临资源需求超出节点算力的问题,因此算力约束下的混合专家模型计算优化成为领域研究持续关注的热点问题。介绍了混合专家模型的概念与架构,并从门控网络、专家结构及模型、内存管理三个维度出发,对相关优化方法展开分类综述。在门控网络层面,研究了路由设计、损失函数优化和负载均衡机制,从而实现了精确路由;在专家结构层面,总结了各类专家设计、预处理方法和专家合并策略的结构创新;在内存管理层面,综述了现有的参数压缩和内存卸载技术,以应对模型在部署时面临的资源受限问题。分析了不同维度下计算优化的原理、策略及主要技术挑战,提出了领域研究需要关注的重点问题及潜在研究机会。

关键词: 混合专家模型(MoE), 计算优化, 负载均衡, 专家结构, 内存管理

Abstract: Large language models have achieved remarkable results in natural language processing and other fields in recent years. Mixture of experts (MoE) reduces computational demands by employing sparse activation strategies. As the inference tasks faced by MoE grow increasingly complex, expert models deployed on edge devices often encounter resource requirements exceeding node computing power. Consequently, computational optimization of MoE under computational constraints has become a persistent research focus in the field. This paper introduces the concept and architecture of MoE, and categorizes and reviews relevant optimization methods across three dimensions: gating networks, expert structures and models, and memory management. At the gating network level, encompasses routing design, loss function optimization, and load balancing mechanisms are studied to achieve precise routing. At the expert structure level, structural innovations are summarized, including various expert designs, preprocessing methods, and expert merging strategies. At the memory management level, existing parameter compression and memory offloading techniques are reviewed to address resource constraints during model deployment. This paper analyzes the principles, strategies, and key technical challenges of computational optimization across different dimensions, and then identifies critical issues and potential research opportunities that need to be focused on.

Key words: mixture of experts (MoE), computational optimization, load balancing, expert structure, memory management