About Me
✨ Biograpgy [CV]
I am a Ph.D. student atShanghai Jiao Tong University and Shanghai AI Lab. My advisors are Yali Wang. I received my B.S. degree in Computer Science and Technology from China University of Mining and Technology (Beijing) in 2024. Currently, I am a Research Intern at Shanghai AI Lab. I was fortunate to be involved in internship programs at Samsung and SIAT.
My research interests include:- Unified Multimodal Understanding and Generation
- Video Understanding
- Video Generation
- Multimodal Agent
🔥🔥🔥 I'm actively pursuing intern opportunities in Multimodal Understanding and Generation. Feel free to reach out for potential collaborations.
📑 Publications
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue, Haiyu Zhang, Xiangyu Zeng, Boyu Chen, Chenting Wang, Shaobin Zhuang, Lu Dong, KunPeng Du, Yi Wang, Limin Wang, Yali WangArxiv, 2025.
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
Zhentao Zou, Zhengrong Yue, Kunpeng Du, Binlei Bao, Hanting Li, Haizhen Xie, Guozheng Xu, Yue Zhou, Yali Wang, Jie Hu, Xue Jiang, Xinghao ChenArxiv, 2025.
G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
Boyu Chen, Siran Chen, Zhengrong Yue, Kainan Yan, Chenyun Yu, Beibei Kong, Cheng Lei, Chengxiang Zhuo, Zang Li, Yali WangArxiv, 2025.
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
Zikang Wang*, Boyu Chen*, Zhengrong Yue*, Yi Wang, Yu Qiao, Limin Wang, Yali Wang .Arxiv, 2025.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan*, Yinan He*, Xinhao Li*, Zhengrong Yue*, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang .NIPS 2025.
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen*, Zhengrong Yue*, Siran Chen*, Zikang Wang*, Yang Liu, Peng Li, Yali Wang .ICCV 2025.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue*, Shaobin Zhuang*, Kunchang Li*, Yanbo Ding*, Yali Wang.CVPR 2025.
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang.ICLR 2025.
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding*, Shaobin Zhuang*, Kunchang Li*, Zhengrong Yue*, Yu Qiao, Yali Wang.AAAI 2025.
🤵🏻 Internships
Huawei, Foundation Model Department (Noah's Ark Lab)
July 2025 Unified Multimodal Model and Benchmark.Shanghai Artificial Intelligence Laboratory, General Vision Lab (OpenGVLab)
July 2024 Pre-training and Representation Learning for Unified Understanding and generation.Shenzhen Institute of Advanced Technology, Multimedia Lab (MMLAB)
November 2023 Explored video style editing based on MLLM Agents.Samsung R&D Institute China-Beijing, Language Understanding Lab (LUL)
September 2023 Developed a multilingual document question-answering large model for Galaxy Z-Fold smartphones based on RAG.
🏅 Honors
- The 23td China Robotics and Artificial Intelligence Competition Intelligent Sorting Challenge (National No. 1) National First Prize(2022)
- The 17th National Undergraduate Smart Car Competition Xunfei Creative Group (National Top Four) National First Prize(2022)
- The 15th National College Student Energy Conservation and Emission Reduction Social Practice and Science and Technology Contest, National Third Prize(2022)
- The 15th China Undergraduate Computer Design Contest Provincial Third Prize(2022)
🤝 Services
- Attend CVPR 2023 Beijing Workshop, 2023.06
