• 内网
  • 搜索
  • 学院概况
    • 学院概况
    • 联系我们
  • 师资力量
  • 招生培养
    • 招生信息
    • 招生资讯
    • 学院课程
  • 科研创新
    • 人工智能理论及系统中心
    • 语言模型与人机交互中心
    • 科学与工程智能中心
    • 社会科学智能中心
    • 具身智能与计算机视觉中心
  • 最新资讯
    • 学院动态
    • 活动预告
    • 通知公告
    • 采购公告
  • 人才招聘
  • 学术论坛

面包屑

  • 首页
  • 学术论坛
  • 【SLAI Seminar】Two Perspectives on Muon for Deep Learning Optimization: Preconditioning and Isotropic Curvature(Jan 8, 10:00)

【SLAI Seminar】Two Perspectives on Muon for Deep Learning Optimization: Preconditioning and Isotropic Curvature(Jan 8, 10:00)

2026-01-07 学术论坛

You are cordially invited to attend the seminar with the topic on "Two Perspectives on Muon for Deep Learning Optimization: Preconditioning and Isotropic Curvature", from 10AM, January 8th (tomorrow).

Speaker: Prof. Weijie Su (UPenn)

Host: Prof. Ruoyu Sun

Mode of Participation: Hybrid

On-site: B411 Lecture Hall

Remote: Tencent Meeting (Meeting ID: 792-529-745)


About the Speaker: 

Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer and Information Science and Mathematics at the University of Pennsylvania. He is a co-director of the Penn Research in Machine Learning (PRiML) Center. He received his Ph.D. from Stanford University in 2016 and bachelor's degree from Peking University in 2011. His research interests span the mathematical foundations of generative AI, privacy-preserving machine learning, optimization, and high-dimensional statistics. He serves as a founding co-editor of the new journal Statistical Learning and Data Science, an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Operations Research, Journal of the Operations Research Society of China, the Annals of Applied Statistics, Foundations and Trends in Statistics, and Harvard Data Science Review, and he is currently on the Organizing Committee of ICML 2026 as the Scientific Integrity Chair. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, ICBS Frontiers of Science Award in Mathematics, IMS Medallion Lectureship, and Outstanding Young Talent Award in the 2025 China Annual Review of Mathematics, and he is a Fellow of the IMS.

Abstract:

Introduced in December 2024, Muon is an optimization method for training language models that updates the weight along the direction of an orthogonalized gradient. The superiority of Muon has been quickly recognized, as demonstrated on industry-scale models; for example, it has been successfully used to train a trillion-parameter frontier language model. In this talk, we offer two perspectives to shed light on this matrix-gradient method. First, we introduce a unifying framework that precisely distinguishes between preconditioning for curvature anisotropy (like Adam) and gradient anisotropy (like Muon). This perspective not only offers new insights into Adam's instabilities and Muon's accelerated convergence but also leads to a new extension, such as PolarGrad. Next, we introduce a second perspective based on an isotropic curvature model. We derive this model by assuming isotropy of curvature (including Hessian and higher-order terms) across all perturbation directions. We show that under a general growth condition, the optimal update is one that makes the gradient's spectrum more homogeneous; that is, making its singular values closer in ratio. We then show that the orthogonalized gradient becomes optimal for this model when the curvature exhibits a phase transition in growth. Taken together, these results suggest that the gradient orthogonalization employed in Muon is directionally correct but may not be strictly optimal, and we will discuss how to leverage this model for designing new optimization methods. This talk is based on arXiv:2505.21799 and 2511.00674.

 

 

相关推荐

【SLAI Seminar】第二十一期:New Advances in Person Re-identification——From Individual Pedestrians to Small Groups, From Ground to Air-Ground Integration 行人重识别新进展——自单一行人到小小群体,自地面到空天一体化 (Jan 12, 16:00)

【SLAI Seminar】20th Session:Toward Secure and Privacy-Preserving Extended Realty Systems 面向安全与隐私保护的扩展现实系统 (Jan 9, 10:00)
预告 | SLAI Seminar第十九期
关注我们
联系方式
  • 招生:admission@slai.edu.cn 教授招聘:FacultyHiring@slai.edu.cn 校企合作:coop@slai.edu.cn 人才招聘:staff_careers@slai.edu.cn 招投标:bidding@slai.edu.cn
  • 院务办公室:executiveoffice@slai.edu.cn 学生事务:student@slai.edu.cn 院长信箱:deanoffice@slai.edu.cn 财务:financeoffice@slai.edu.cn 地址:福田保税区红棉道6号深圳河套学院
探索更多
  • 学院概况 人才招聘 内网

版权所有 © 深圳河套学院 粤ICP备14099122号-14 

​