• 内网
  • Search
  • 简体中文
  • About
    • About Us
    • Contact Us
  • Faculty
  • Admissions
    • Admission
  • Research
    • Center for AI Theoretical Foundation and Systems
    • Center for Language, Intelligence and Machines
    • Center for AI for Science and Engineering
    • Center for AI for Social Science
    • Center for Embodied Artificial Intelligence and Computer Vision
  • News
    • School News
  • Recruitment
    • Academic Positions
  • Academic Forum
    • Forum Schedule

Breadcrumb

  • Home
  • Academic Forum
  • 【SLAI Seminar】23th:Bridging the Virtual-Physical Gap: The Path to Generalization of Embodied Intelligence Driven by Foundational Models (Jan 19,14:30)

【SLAI Seminar】23th:Bridging the Virtual-Physical Gap: The Path to Generalization of Embodied Intelligence Driven by Foundational Models (Jan 19,14:30)

January 19, 2026 Forum Schedule

SLAI Seminar 23rd Session will be discussing the topic on "Bridging the Virtual-Physical Gap: The Path to Generalization of Embodied Intelligence Driven by Foundational Models ", from 2:30pm to 4pm, January 19th (Monday) at B411 Lecture Hall, online participation is welcome (Tencent Meeting ID: 250-582-805)

 

About the Speaker:

Prof. Su Hang is an associate research fellow in the Department of Computer Science and Technology at Tsinghua University and has been selected for the National "Ten Thousand Talents Plan" Young Top Talent program. His main research areas include robust machine learning and embodied decision-making. He has published over 100 papers in CCF-recommended A-class conferences and journals, with over 15,000 Google Scholar citations. He has been invited to serve as an editorial board member for top-tier artificial intelligence journals such as IEEE TPAMI and Artificial Intelligence, and chairs the IEEE Generative AI Security Working Group. He has received numerous academic awards, including the Wu Wenjun Artificial Intelligence Natural Science First Prize, the ICME Platinum Best Paper Award, the MICCAI Young Scientist Award, and the AVSS Best Paper Award. He has also led teams to win championships in several international academic competitions, such as the NeurIPS

2017 Adversarial Attack and Defense Challenge. Currently, he serves as an executive committee member of the Youth Working Committee of the China Society of Image and Graphics. He previously served as the Chair of the VALSE Executive AC Committee, Area Chair for NeurIPS 21, and Workshop Co-Chair for AAAI 22.

 

Abstract:

The lack of generalization ability is a core bottleneck preventing embodied intelligence from moving beyond the laboratory and adapting to complex real-world environments. With the breakthroughs of foundational models in language and vision, building foundational models for embodied intelligence has become a key pathway to enable cross-task and cross-platform transfer. This report focuses on this theme and proposes a systematic, data-driven, and capability-evolution-oriented strategy. Leveraging three types of data—real-world data, simulated data, and video data—we advance the step-by-step enhancement of the generalization capabilities of embodied foundational models in stages. First, starting from high-quality real-world robot data and integrating physical priors with cross-ontology multimodal diffusion model pre-training, we construct a unified action-space model. In dual-arm manipulation tasks, this model demonstrates strong robustness and excellent transfer performance, significantly improving adaptation to real-world physical environments and control consistency. Building on this, we introduce medium-scale simulated data and, based on the ManiBox framework, propose a bounding-box-guided policy distillation technique to effectively mitigate the Sim2Real transfer gap. Finally, we explore the potential of large-scale, low-structure video data in weakly supervised scenarios, designing a video-action model that combines diffusion model pre-training with masked action modeling. This drives cross-modal knowledge transfer from visual input to embodied control, further enhancing the model's perceptual generalization capabilities and cross-platform deployment flexibility. Overall, this evolutionary path from "high-quality, small-scale" to "low-quality, large-scale" data provides systematic support for the hierarchical advancement of embodied foundational model capabilities, laying a theoretical foundation and technical pathway for their evolution toward generalization and industrialization.

Contact Us
Contact Us
  • Admissions:admission@slai.edu.cn Admissions Hotline:(86)0755 81970253 (Weekdays, 9:30–11:00 am & 3:00–5:00 pm) Faculty Recruitment:FacultyHiring@slai.edu.cn Industry-Academia Collaboration:coop@slai.edu.cn
  • Staff Careers:staff_careers@slai.edu.cn Executive Office: executiveoffice@slai.edu.cn Student Affairs: student@slai.edu.cn Bidding: bidding@slai.edu.cn Dean's Office: deanoffice@slai.edu.cn
  • Finance Office: financeoffice@slai.edu.cn Tel:0755-83590055 (Weekdays, 9:30–11:00 am & 3:00–5:00 pm) No. 6 Hongmian Road, Futian Free Trade Zone
Business Hours
  • 8:30–12:00, 13:00–17:30 (Monday to Friday) Closed on Weekends & Public Holidays

Copyright © SLAI All Rights Reserved. 粤ICP备14099122号-14 

​