预告 | SLAI Seminar第十八期
Title
从数据特征理解深度学习
Understanding Deep Learning Through the Lens of Data Characteristics
About the Speaker

许志钦教授
Prof. Zhiqin Xu
许志钦,上海交通大学自然科学研究院/数学科学学院教授。2012年本科毕业于上海交通大学致远学院。2016年博士毕业于上海交通大学,获应用数学博士学位。2016年至2019年,在纽约大学阿布扎比分校和柯朗研究所做博士后。研究兴趣是深度学习的数学基础与应用。在大模型方面,发现复杂度对大模型记忆和推理影响的机制。在深度学习基础研究方面,与合作者共同发现深度学习中的频率原则、参数凝聚和能量景观嵌入原则,发展多尺度神经网络等。在AI for Science,主要是在燃烧化学反应方面,与合作者共同发展基于深度深习的机理简化方法和基于深度学习的替代模型加速燃烧模拟。
Zhiqin Xu is a professor at the Institute of Natural Sciences/School of Mathematical Sciences at Shanghai Jiao Tong University. Prof. Xu graduated from the Zhiyuan College of Shanghai Jiao Tong University in 2012 with a bachelor's degree and earned his Ph.D. in Applied Mathematics from Shanghai Jiao Tong University in 2016. From 2016 to 2019, Prof. Xu conducted postdoctoral research at New York University Abu Dhabi and the Courant Institute of Mathematical Sciences. His research interests focus on the mathematical foundations and applications of deep learning. In the field of large models, he has discovered mechanisms through which complexity influences the memorization and reasoning of large models. In fundamental deep learning research, in collaboration with others, Prof. Xu has uncovered principles such as the frequency principle, parameter condensation, and energy landscape embedding in deep learning, and has developed multi-scale neural networks. In the field of AI for Science, particularly in combustion chemistry, Prof. Xu has collaborated with others to develop deep learning-based methods for mechanism reduction and surrogate models to accelerate combustion simulations.
Abstract
理解深度学习在实际问题中的性能需要考虑模型特征、数据特征以及连接这两部分的优化算法的特征。该报告将从函数频率、有效复杂度、信噪比、推理复杂度、关联统计量等角度来分析数据特征,并设计实验来挖掘模型和优化的特征,以理解深度学习的泛化能力和语言模型的推理能力,并对实际的模型训练提供一些参考。我们发现小初始化会使模型更偏好推理的方式来解释数据,而非记忆的方式,这与模型在小初始化有凝聚的现象紧密相关。另外,数据中的一些重要统计量是形成嵌入结构的驱动力,并影响模型的推理能力。
Understanding the performance of deep learning in practical problems requires consideration of model characteristics, data characteristics, and the features of optimization algorithms that connect these two components. This report will analyze data characteristics from perspectives such as function frequency, effective complexity, signal-to-noise ratio, inference complexity, and correlation statistics. It will also design experiments to explore the features of models and optimization, aiming to understand the generalization ability of deep learning and the reasoning capabilities of language models. Additionally, the report will provide practical insights for model training. We found that small initialization encourages models to interpret data through reasoning rather than memorization, which is closely related to the phenomenon of parameter condensation observed in models with small initialization. Furthermore, certain key statistical properties in the data drive the formation of embedding structures and influence the reasoning abilities of models.
Host
王东教授
Prof. Dong Wang
Date & Time
2025年12月29日(星期一)
上午10:00-12:00
December 29, 2025, Monday,
10:00am-12:00pm
Venue
深圳河套学院B411阶梯教室
(深圳市福田区福保街道红棉路6号
地图导航“深圳河套学院-南门”)
B411 Lecture Hall, Shenzhen Loop Area Institute
(6 Hongmian Rd, Fubao Sub-Street, Futian District, Shenzhen, navigate to "Shenzhen Loop Area Institute (South Gate)" on the map.)
Online link
扫码加入会议
Join the Meeting Online

腾讯会议号:320-722-376
Tencent Meeting:320-722-376