About Me

✨ Biograpgy [CV]

I am a Ph.D. student at

Shanghai Jiao Tong University and Shanghai AI Lab. My advisors are Yali Wang. I received my B.S. degree in Computer Science and Technology from China University of Mining and Technology (Beijing) in 2024. Currently, I am a Research Intern at Shanghai AI Lab. I was fortunate to be involved in internship programs at Samsung and SIAT.

My research interests include:

Unified Multimodal Understanding and Generation
Video Understanding
Video Generation
Multimodal Agent

Most of my work focuses on unified multimodal understanding and generation foundation models, covering model design, large-scale pretraining, dataset collection, and benchmark evaluation.

🔥🔥🔥 I'm actively pursuing intern opportunities in Multimodal Understanding and Generation. Feel free to reach out for potential collaborations.

📑 Publications

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
Zhengrong Yue, Haiyu Zhang, Xiangyu Zeng, Boyu Chen, Chenting Wang, Shaobin Zhuang, Lu Dong, KunPeng Du, Yi Wang, Limin Wang, Yali Wang
Arxiv, 2025.
Paper Code
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing
Zhentao Zou, Zhengrong Yue, Kunpeng Du, Binlei Bao, Hanting Li, Haizhen Xie, Guozheng Xu, Yue Zhou, Yali Wang, Jie Hu, Xue Jiang, Xinghao Chen
Arxiv, 2025.
Paper Code
G-UBS: Towards Robust Understanding of Implicit Feedback via Group-Aware User Behavior Simulation
Boyu Chen, Siran Chen, Zhengrong Yue, Kainan Yan, Chenyun Yu, Beibei Kong, Cheng Lei, Chengxiang Zhuo, Zang Li, Yali Wang
Arxiv, 2025.
Paper Code
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning
Zikang Wang*, Boyu Chen*, Zhengrong Yue*, Yi Wang, Yu Qiao, Limin Wang, Yali Wang .
Arxiv, 2025.
Paper Code
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan*, Yinan He*, Xinhao Li*, Zhengrong Yue*, Xiangyu Zeng, Yali Wang, Yu Qiao, Limin Wang, Yi Wang .
NIPS 2025.
Paper Code
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
Boyu Chen*, Zhengrong Yue*, Siran Chen*, Zikang Wang*, Yang Liu, Peng Li, Yali Wang .
ICCV 2025.
Paper Code
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue*, Shaobin Zhuang*, Kunchang Li*, Yanbo Ding*, Yali Wang.
CVPR 2025.
Paper Code
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng, Kunchang Li, Chenting Wang, Xinhao Li, Tianxiang Jiang, Ziang Yan, Songze Li, Yansong Shi, Zhengrong Yue, Yi Wang, Yali Wang, Yu Qiao, Limin Wang.
ICLR 2025.
Paper Code
Muses: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding*, Shaobin Zhuang*, Kunchang Li*, Zhengrong Yue*, Yu Qiao, Yali Wang.
AAAI 2025.
Paper Code

🤵🏻 Internships

Huawei, Foundation Model Department (Noah's Ark Lab)
July 2025 Unified Multimodal Model and Benchmark.
Shanghai Artificial Intelligence Laboratory, General Vision Lab (OpenGVLab)
July 2024 Pre-training and Representation Learning for Unified Understanding and generation.
Shenzhen Institute of Advanced Technology, Multimedia Lab (MMLAB)
November 2023 Explored video style editing based on MLLM Agents.
Samsung R&D Institute China-Beijing, Language Understanding Lab (LUL)
September 2023 Developed a multilingual document question-answering large model for Galaxy Z-Fold smartphones based on RAG.

🏅 Honors

The 23td China Robotics and Artificial Intelligence Competition Intelligent Sorting Challenge (National No. 1) National First Prize(2022)
The 17th National Undergraduate Smart Car Competition Xunfei Creative Group (National Top Four) National First Prize(2022)
The 15th National College Student Energy Conservation and Emission Reduction Social Practice and Science and Technology Contest, National Third Prize(2022)
The 15th China Undergraduate Computer Design Contest Provincial Third Prize(2022)

🤝 Services

Attend CVPR 2023 Beijing Workshop, 2023.06

Zhengrong Yue

✨ Biograpgy [CV]

📑 Publications

🤵🏻 Internships

🏅 Honors

🤝 Services