I am currently a third-year Ph.D. student at the Institute of Artificial Intelligence at Peking University, where I'm part of both the CoRe Lab and the General Vision Lab, under the guidance of Tengyu Liu, Siyuan Huang, and Yixin Zhu. Additionally, I'm undertaking an internship at NVIDIA.

My research focuses on human motion generation, robotics, scene understanding, and leveraging AI for scientific discovery, particularly in the field of biology. Moving forward, my research will focus on how embodied agents can execute actions based on open-vocabulary instructions within real-world environments. I graduated from Beihang University, advised by Guizhen Yu and Yunpeng Wang.

๐Ÿ”ฅ News

  • 2026.03: Presentated latest work at GTC 2026, NVIDIA
  • 2025.10: ๐ŸŽ‰ One paper is accepted by NeurIPS 2025 Workshop EMBODIED-WORLD-MODELS
  • 2025.08: ๐ŸŽ‰ One paper is accepted by CoRL 2025
  • 2025.06: Presented at the AIR-SUN lab, Institute for AI Industry Research (AIR), Tsinghua University
  • 2025.02: ๐ŸŽ‰ One paper is accepted by CVPR 2025 as an oral presentation
  • 2024.06: Presented at CVPR workshop MANGO
  • 2024.03: ๐ŸŽ‰ One demo is accepted by 3DV 2024 demo track
  • 2024.02: ๐ŸŽ‰ One paper is accepted by CVPR 2024

๐Ÿ“ Preprints

In Submission
sym

LessMimic: Long-Horizon Humanoid Interaction with Unified Distance Field Representations
Yutang Lin*, Jieming Cui*, Yixuan Li, Baoxiong Jia, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Video / Paper

We introduce LessMimic, a unified interaction representation enabling reference-free inference, geometric generalization, and long-horizon skill composition within one policy remains an open challenge.

In Submission
sym

UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots
Nan Jiang*, Zimo He*, Wanhe Yu, Lexi Pang, Yunhao Li, Hongjie Li, Jieming Cui, Yuhan Li, Zizhou Wang, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Video / Paper

We introduce UniAct, a two-stage framework integrating a fine-tuned MLLM with a causal streaming pipeline, enabling humanoid robots to execute multimodal instructions with sub-500ms latency.

๐Ÿ“ Publications

CoRL 2025
sym

CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks
Yixuan Li*, Yutang Lin*, Jieming Cui, Tengyu Liu, Wei Liang, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Video / Paper

We introduce CLONE, an MoE-based policy with closed-loop error correction for holistic humanoid teleoperation. It enables capabilities previously unattainable with existing systemsโ€”such as whole-body coordination and long-horizon task execution, with minimal input from a commercial MR headset.

CVPR 2025 Oral
sym

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui*, Tengyu Liu*, Ziyu Meng, Jiale Yu,Ran Song,Wei Zhang, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Video / ๅŒ—ๅคงๆ–ฐ้—ป็ฝ‘ / ๅŒ—ๅคง็ง‘็ ”่ฟ›ๅฑ•้€Ÿ่งˆ

We introduce GROVE, a generalized reward framework that enables open-vocabulary physical skill learning without manual engineering or task-specific demonstrations.

CVPR 2024
sym

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui*, Tengyu Liu*, Nian Liu*, Yaodong Yang, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Video / Paper

We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.

NeurIPS 2023
sym

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui*, Ziren Gong* , Baoxiong Jia*, Siyuan Huang, Zilong Zhengโ€ , Jianzhu Maโ€ , Yixin Zhuโ€ 

Project / Code / Video / Paper / ๅŒ—ๅคงAI้™ขๅฎ˜ๅพฎ

We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards monitoring system. This dataset comprises fine-grained hierarchical annotations intended for the purpose of studying activity understanding in Biology lab.

ICCV 2023
sym

Full-Body Articulated Human-Object Interaction
Nan Jiang*, Tengyu Liu*, Zhexuan Cao, Jieming Cui, He Wang, Yixin Zhuโ€ , Siyuan Huangโ€ 

Project / Code / Paper

We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 74 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions.

  • ASCE 2021 Forecasting Freeway On-Ramp Lane-Changing Behavior Based on GRU, Jieming Cui*, Guizhen Yu* , Bin Zhou, Qiujun Liu, Zhengguo Guanโ€ 

  • SAE 2021 Three-Dimensional Object Detection Based on Deep Learning in Enclosed Scenario, Jieming Cui*, Guizhen Yu* , Na Zhang, Zhangyu Wang

  • ๐Ÿ“– Educations

    • 2023.09 - now, PhD, Peking University, Beijing.
    • 2019.09 - 2022.02, Master, Beihang University, Beijing.
    • 2015.09 - 2019.06, Undergraduate, Beijing Jiaotong University, Beijing.

    ๐Ÿ’ป Internships

    • 2022.02 - now, BIGAI, Beijing.
    • 2021.06 - 2021.09, Amap, Alibaba, Beijing.
    • 2020.02 - 2021.06, Tage Zhixing, Beijing.