About Me

Biograpgy [CV]

I am a Ph.D. student at

Shanghai Jiao Tong University and Shanghai AI Lab. My advisors are Yali Wang. I received my B.S. degree in Computer Science and Technology from China University of Mining and Technology (Beijing) in 2024. Currently, I am a Research Intern at Shanghai AI Lab. I was fortunate to be involved in internship programs at Samsung and SIAT.

My research interests include:
  • Unified Multimodal Understanding and Generation
  • Video Understanding
  • Video Generation
  • Multimodal Agent
Most of my work focuses on unified multimodal understanding and generation foundation models, covering model design, large-scale pretraining, dataset collection, and benchmark evaluation.

🔥🔥🔥 I'm actively pursuing intern opportunities in Multimodal Understanding and Generation. Feel free to reach out for potential collaborations.

📑 Publications

  • VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

    Zikang Wang*, Boyu Chen*, Zhengrong Yue*, Yi Wang, Yu Qiao, Limin Wang, Yali Wang .

    Arxiv, 2025.

    Paper Code

  • VTTS: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

    Ziang Yan*, Yinan He*, Xinhao Li*, Zhengrong Yue*, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang .

    Arxiv, 2025.

    Paper Code

  • LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

    Boyu Chen*, Zhengrong Yue*, Siran Chen*, Zikang Wang*, Yang Liu, Peng Li, Yali Wang .

    Arxiv, 2025.

    Paper Code

  • V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

    Zhengrong Yue*, Shaobin Zhuang*, Kunchang Li*, Yanbo Ding*, Yali Wang.

    Computer Vision and Pattern Recognition (CVPR), 2025.

    Paper Code

  • TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

    Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang.

    International Conference on Learning Representations (ICLR), 2025.

    Paper Code

  • Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

    Yanbo Ding*, Shaobin Zhuang*, Kunchang Li*, Zhengrong Yue*, Yu Qiao, Yali Wang.

    Association for the Advancement of Artificial Intelligence (ECCV), 2025.

    Paper Code

🤵🏻 Internships

  • Shanghai Artificial Intelligence Laboratory, General Vision Lab (OpenGVLab)

    July 2024 Studied integrated video understanding and generation tasks within multimodal frameworks.
  • Shenzhen Institute of Advanced Technology, Multimedia Lab (MMLAB)

    November 2023 Explored video style editing based on MLLM Agents.
  • Samsung R&D Institute China-Beijing, Language Understanding Lab (LUL)

    September 2023 Developed a multilingual document question-answering large model for Galaxy Z-Fold smartphones based on RAG.

🏅 Honors

  • The 23td China Robotics and Artificial Intelligence Competition Intelligent Sorting Challenge (National No. 1) National First Prize(2022)
  • The 17th National Undergraduate Smart Car Competition Xunfei Creative Group (National Top Four) National First Prize(2022)
  • The 15th National College Student Energy Conservation and Emission Reduction Social Practice and Science and Technology Contest, National Third Prize(2022)
  • The 15th China Undergraduate Computer Design Contest Provincial Third Prize(2022)

🤝 Services

  • Attend CVPR 2023 Beijing Workshop, 2023.06