About Me
✨ Biograpgy [CV]
I am a Ph.D. student atShanghai Jiao Tong University and Shanghai AI Lab. My advisors are Yali Wang. I received my B.S. degree in Computer Science and Technology from China University of Mining and Technology (Beijing) in 2024. Currently, I am a Research Intern at Shanghai AI Lab. I was fortunate to be involved in internship programs at Samsung and SIAT.
My research interests include:- Unified Multimodal Understanding and Generation
- Video Understanding
- Video Generation
- Multimodal Agent
🔥🔥🔥 I'm actively pursuing intern opportunities in Multimodal Understanding and Generation. Feel free to reach out for potential collaborations.
📑 Publications
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
Zikang Wang*, Boyu Chen*, Zhengrong Yue*, Yi Wang, Yu Qiao, Limin Wang, Yali Wang .Arxiv, 2025.
VTTS: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan*, Yinan He*, Xinhao Li*, Zhengrong Yue*, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang .Arxiv, 2025.
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen*, Zhengrong Yue*, Siran Chen*, Zikang Wang*, Yang Liu, Peng Li, Yali Wang .Arxiv, 2025.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue*, Shaobin Zhuang*, Kunchang Li*, Yanbo Ding*, Yali Wang.Computer Vision and Pattern Recognition (CVPR), 2025.
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang.International Conference on Learning Representations (ICLR), 2025.
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding*, Shaobin Zhuang*, Kunchang Li*, Zhengrong Yue*, Yu Qiao, Yali Wang.Association for the Advancement of Artificial Intelligence (ECCV), 2025.
🤵🏻 Internships
Shanghai Artificial Intelligence Laboratory, General Vision Lab (OpenGVLab)
July 2024 Studied integrated video understanding and generation tasks within multimodal frameworks.Shenzhen Institute of Advanced Technology, Multimedia Lab (MMLAB)
November 2023 Explored video style editing based on MLLM Agents.Samsung R&D Institute China-Beijing, Language Understanding Lab (LUL)
September 2023 Developed a multilingual document question-answering large model for Galaxy Z-Fold smartphones based on RAG.
🏅 Honors
- The 23td China Robotics and Artificial Intelligence Competition Intelligent Sorting Challenge (National No. 1) National First Prize(2022)
- The 17th National Undergraduate Smart Car Competition Xunfei Creative Group (National Top Four) National First Prize(2022)
- The 15th National College Student Energy Conservation and Emission Reduction Social Practice and Science and Technology Contest, National Third Prize(2022)
- The 15th China Undergraduate Computer Design Contest Provincial Third Prize(2022)
🤝 Services
- Attend CVPR 2023 Beijing Workshop, 2023.06