I am Yinda Chen (้่ค่พพ), a Ph.D. candidate jointly trained by the University of Science and Technology of China (USTC) and the Shanghai AI Lab, majoring in Information and Communication Engineering.
-
๐ My research focuses on multimodal understanding and generation. Initially, I explored these two directions separatelyโworking on representation learning, self-supervised pretraining, and efficient fine-tuning in computer vision and other multimodal tasks. Now, Iโm gradually moving toward a unified framework that bridges understanding and generation across various domains.
-
๐งญ I have the immense privilege of being advised by Prof. Feng Wu (ๅดๆซ, CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong (็ๅฟไผ) at USTC, and co-supervised by Prof. Xiaoou Tang (ๆฑคๆ้ธฅ, IEEE Fellow) at the Shanghai AI Lab. I also have the honor to collaborate closely with Prof. Dong Liu (ๅไธ), Prof. Li Li (ๆ็คผ), and Prof. Zhihai He (ไฝๅฟๆตท, IEEE Fellow). Additionally, Iโve been fortunate to conduct research internships at Imperial College London with Dr. Rossella Arcucci and Dr. Che Liu (ๅๆพ), and the PLA General Hospital (301 Hospital) with Prof. Qionghai Dai (ๆด็ผๆตท, CAE Academician, IEEE Fellow).
-
๐ฑ Currently, Iโm a core member at the Beijing Humanoid Robot Innovation Center (UBTECH Robotics), focusing on embodied intelligence world models and unified models for humanoid robots. Previously, I was an intern at Tencent IEG (Interactive Entertainment Group) through the Qingyun Program, working on TTS (Text-to-Speech) generation and multimodal understanding for Honor of Kings โ Lingbao projects under the guidance of Dr. Liang Du (ๆ้) and Dr. Wentao Yao (ๅงๆ้ฌ).
-
๐๏ธ Coming from a non-traditional background with dual bachelorโs degrees in Environmental Ecological Engineering and Economics at Xiamen University, Iโve always been fascinated by interdisciplinary research. During my undergraduate years, I had the privilege of being mentored by Prof. Yuanye Zhang (ๅผ ๅ้), whose passion and rigor guided me into my initial research endeavors. My journey into AI began in 2021 as an active member of WISERCLUB, a data science community that shaped my early curiosityโIโm deeply grateful to my good friend Ziqian Lin (ๆๅญ่ฐฆ) (now at Peking University Guanghua School of Management, Department of Statistics) for recommending me to this amazing community. As someone who transitioned across fields, I deeply understand the challenges of learning something entirely new, and this experience has made me incredibly grateful for every mentor and collaborator who has guided me along the way.
-
๐ Thanks to the generous guidance of my mentors and the support of amazing collaborators, Iโve been fortunate to publish papers
at conferences including ICLR, ICML, ICCV, NeurIPS, AAAI, ACL, IJCAI, and MICCAI, and journals including TPAMI, TMI, and JBHI.
I know my talents may not match those of many brilliant pioneers in this field, but I believe in the power of continuous learning and genuine collaboration. Iโm always eager to exchange ideas, seek advice from the community, and help those who are just starting their research journey. Please feel free to reach outโIโd be truly honored to connect, collaborate, or simply share thoughts about research! ๐
๐ฏ Research Areas
๐ฅ News
- 2026.01 ๐ Awarded the Zenghua Scholarship (ๅขๅๅฅๅญฆ้) by USTC, a university-level scholarship.
- 2026.01 ๐ Two papers were accepted by ICLR 2026. See you in Rio de Janeiro, Brazil!
- 2026.01 โ๏ธ Traveling to Singapore for AAAI 2026. Welcome friends interested in embodied intelligence, world models, and medical imaging to connect!
- 2026.01 ๐ One paper on medical image registration was accepted by ICASSP 2026. See you in Barcelona, Spain in May!
- 2025.12 ๐ผ Joined Beijing Humanoid Robot Innovation Center (UBTECH Robotics) as a core member, focusing on embodied intelligence world models for humanoid robots.
- 2025.11 ๐ My advisor Prof. Feng Wu (ๅดๆซ) was elected as an Academician of the Chinese Academy of Engineering (CAE). A role model forever.
- 2025.11 ๐ One paper was accepted by AAAI 2026.
- 2025.11 ๐ One paper was accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology).
- 2025.10 ๐ One paper was accepted by JBHI (IEEE Journal of Biomedical and Health Informatics).
- 2025.08 ๐ผ Started as Qingyun intern at Tencent IEG, working on game video scene understanding.
- 2025.06 ๐ One paper was accepted by ICCV 2025.
- 2025.05 ๐ One paper was accepted by ACL 2025 findings.
- 2025.05 ๐ One paper was accepted by ICML 2025.
- 2025.01 ๐ One paper was selected as oral by AAAI 2025.
- 2024.12 ๐ Successfully selected as the principal investigator of the Ph.D. Natural Science Foundation Project.
- 2024.10 ๐ One paper was accepted by NeurIPS 2024.
๐ Education
๐ Academic Background 3
University of Science and Technology of China & Shanghai AI Lab
Ph.D. in Information and Communication Engineering (Expected 2027)
๐ Hefei & Shanghai ยท Sept 2024 - Present
- Research focus: Machine Learning Theory, Self-Supervised Pretraining, Multimodal Large Models, Image Coding & Compression
- Working with Prof. Feng Wu (CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong
- Co-supervised by Prof. Xiaoou Tang (IEEE Fellow) at Shanghai AI Lab
- Selected coursework: Algorithm Design and Analysis, Statistical Learning, Deep Learning, Reinforcement Learning
- Principal Investigator for NSFC Ph.D. Project (2024)
University of Science and Technology of China
M.S. in Computer Technology
๐ Hefei ยท Sept 2022 - July 2024
- Recipient of National Graduate Scholarship (2022)
Xiamen University
B.S. in Environmental Ecological Engineering & Economics (Dual Degree)
๐ Xiamen ยท Sept 2018 - July 2022
- Academic ranking: 1st/31 overall
- Xiamen University Academic Star (2021), CDA Level 1 Certification (2022), Kaggle Expert
- Research advisor: Prof. Yuanye Zhang
๐ผ Professional Experience
๐ข Industry Experience 3
Beijing Humanoid Robot Innovation Center (UBTECH Robotics)
Core Member, Embodied Intelligence World Model Algorithm Team
๐ Beijing ยท Dec 2025 - Present
- Conducting research on embodied intelligence world models
- Developing advanced algorithms for humanoid robot perception and decision-making
Tencent Interactive Entertainment Group (IEG)
Qingyun Program Intern
๐ Shanghai ยท Aug 2025 - Dec 2025
- Developing TTS (Text-to-Speech) systems and multimodal scene understanding for Honor of Kings โ Lingbao and League of Legends highlight commentary
- Supervised by Dr. Liang Du and Dr. Wentao Yao
- Designed TTS models for generating game commentary and character voiceovers, exploring generation-understanding synergy
- Developed multimodal algorithms to analyze game video content and automatically generate contextual commentary scripts
- Co-developed MAIN-VLA: A game world model for Game for Peace (ๅๅนณ็ฒพ่ฑ), modeling abstraction of intention and environment for Visual-Language-Action in complex gaming scenarios
Chinese PLA General Hospital (301 Hospital)
Research Intern, Data Compression Group
๐ Beijing ยท Sept 2023 - Feb 2024
- Collaborated with Prof. Qionghai Dai (CAE Academician, IEEE Fellow)'s team on efficient data compression research
- Designed image-specific compression algorithms for various data modalities
- Achieved 35% improvement in compression efficiency
๐ Academic Experience 3
Research Intern, Data Science Institute
๐ London (Remote) ยท Nov 2022 - Aug 2023
- Collaborated with Dr. Rossella Arcucci and Dr. Che Liu on multimodal pretraining research
- Developed image-text contrastive learning framework achieving 93.5% accuracy on classification tasks
- Submitted one journal paper
- Designed and led data mining courses, delivering lectures on clustering and Transformer architectures
- Mentored 20 undergraduate students in machine learning projects and organized 2 campus-wide competitions
- Media coverage
Wang Yanan Institute for Studies in Economics, Xiamen University
Research Assistant, Econometrics
๐ Xiamen ยท Aug 2020 - Dec 2021
- Assisted Associate Prof. Jiong Zhu in national land economic statistics research
- Conducted visual feature extraction for homestead information and land use analysis
- Developed satellite imagery analysis tools achieving 85% accuracy in identifying land use changes
๐ Selected Publications
For a complete list of publications, please visit my Google Scholar profile
๐ View Citation Trend
Note: * denotes equal contribution
๐ง Self-Supervised Learning & Pretraining 4

TokenUnify: Scaling Up Autoregressive Pretraining for Computer Vision
ICCV | October 25, 2025
Yinda Chen*; Haoyuan Shi*; Xiaoyu Liu; Te Shi; Ruobing Zhang; Dong Liu; Zhiwei Xiong; Feng Wu
| Code |
Dataset |
Weights |
TokenUnify proposes a hierarchical predictive coding framework for computer vision, reducing autoregressive error from O(K) to O(โK). It introduces a dataset with 1.2 billion annotated voxels and achieves 44% improvement over training from scratch.

MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation
ICML | July 13, 2025
Jiawen Wang; Yinda Chen* (Theory Contribution & Project Leader); Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong
| Code |
Poster |
MaskTwins introduces a dual-form complementary masking strategy for domain-adaptive image segmentation, effectively bridging the domain gap through coordinated spatial and feature-level masking mechanisms.

Self-Supervised Computer Vision with Multi-Agent Reinforcement Learning
![]()
IJCAI (oral) | August 17, 2023
Yinda Chen; Wei Huang; Shenglong Zhou; Qi Chen; Zhiwei Xiong
| Code |
Pretrain Data |
CREMI | VNC |
This paper proposes a decision-based MIM for computer vision segmentation. It uses MARL to optimize masking, outperforming alternatives.

Learning multiscale consistency for self-supervised electron microscopy instance segmentation
![]()
ICASSP | April 13, 2024
Yinda Chen; Wei Huang; Xiaoyu Liu; Shiyu Deng; Qi Chen; Zhiwei Xiong
A pretraining framework for volume instance segmentation is proposed. It enforces multiscale consistency and shows good performance in instance segmentation tasks.
๐ฅ Medical Image Analysis & Vision-Language 4

GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image Segmentation
ICCV Workshop | October 25, 2025
Yinda Chen*; Che Liu*; Wei Huang; Xiaoyu Liu; Haoyuan Shi; Sibo Cheng; Rossella Arcucci; Zhiwei Xiong
GTGM extends Vision-Language Pretraining to 3D medical images by leveraging LLMs to generate synthetic textual descriptions, enabling text-guided representation learning without paired medical text. Combined with a negative-free contrastive learning strategy, GTGM achieves state-of-the-art performance across 10 CT/MRI segmentation datasets.

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning
IEEE Journal of Biomedical and Health Informatics | October 16, 2025
Yinda Chen*; Yangfan He*; Jing Yang; Dapeng Zhang; Zhenlong Yuan; Muhammad Attique Khan; Jamel Baili; Por Lip Yee
EMPOWER proposes an evolutionary framework for prompt optimization through specialized representation learning and multi-dimensional evaluation. It achieves 24.7% reduction in factual errors and 15.3% higher preference scores.

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval
MICCAI | October 06, 2024
Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong
This paper presents BIMCV-R, a 3D CT text-image retrieval dataset, and MedFinder. Tests show MedFinder outperforms baselines in related tasks.

Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks
IEEE Transactions on Medical Imaging | July 29, 2024
Shiyu Deng; Yinda Chen; Wei Huang; Ruobing Zhang; Zhiwei Xiong
The paper proposes an unsupervised domain adaptation method for EM image denoising with invertible networks, outperforming existing methods.
๐ฆ Image Compression 2

Learned Image Coding with Generative Reference of Conditional Latents
IEEE Transactions on Pattern Analysis and Machine Intelligence | Accepted, 2025
Siqi Wu*; Yinda Chen*; Weiming Chen; Dong Liu; K. C. Ho; Zhihai He
GRCL presents a generic framework that exploits semantically correlated external images as conditional coding references in the latent domain. Three reference generation methods are investigated: local dictionary retrieval, web-based image search, and diffusion-based image-text-image generation. Theoretical analysis proves robustness to reference perturbations via subspace recovery error bounds. Achieves up to 1.5 dB PSNR gain over state-of-the-art methods with only ~0.005 bpp overhead.

Condition-generation Latent Coding with an External Dictionary for Deep Image Compression
AAAI (oral) | March 06, 2025
Siqi Wu; Yinda Chen*; Dong Liu; Zhihai He
| Code |
Weights |
The paper proposes CLC for deep image compression. It uses a dictionary to generate references, shows good performance, and has theoretical analysis.
๐จ Image Segmentation & Synthesis 1

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
NeurIPS | October 17, 2024
Haotian Qian; Yinda Chen*; Shengtao Lou; Fahad Shahbaz Khan; Xiaogang Jin; Deng-Ping Fan
| Project | Code |
MaskFactory proposes a two-stage method to generate high-quality synthetic datasets for DIS, outperforming existing methods in quality and efficiency.
๐ฅ Honors and Awards
๐ National Awards & Research Funding 5
- National Natural Science Foundation of China (NSFC) PhD Program (December 2024)
- Role: Principal Investigator
- Achievement: Sole awardee in Information Science, Anhui Province
- Description: Prestigious national research funding program for doctoral students
- Interdisciplinary Contest in Modeling (ICM), Outstanding Winner (May 2024)
- Competition: International mathematical modeling competition organized by COMAP
- Ranking: Top 0.17% of 10,388 participating teams worldwide
- Resources: ๐ Paper & Code
- National Graduate Scholarship (ๅฝๅฎถๅฅๅญฆ้) (December 2022)
- Achievement: National award for academic performance and research contributions (Top 1%)
- Reference: ๐ Official Announcement
- National Undergraduate Mathematics Competition (Non-Major Category), Second Prize (May 2021)
- Competition: National finals organized by the Chinese Mathematical Society
- Progress: Advanced from provincial first place
- Reference: ๐ Competition News
- National Undergraduate Mathematics Competition (Non-Major Category), First Prize (November 2020)
- Competition: High-level national mathematics competition for undergraduate students
- Ranking: Provincial First Place, Fujian Province
๐ Academic Honors & Scholarships 2
- Zenghua Scholarship (ๅขๅๅฅๅญฆ้) (January 2026)
- Level: University-level scholarship, USTC
- Achievement: Awarded for outstanding academic performance and research contributions
- Xiamen University Academic Star (ๅญฆๆฏไนๆ) (December 2021)
- Achievement: Sole undergraduate awardee university-wide
- Recognition: Outstanding academic achievements and research excellence
- Media: ๐ University News | Feature Report
๐ Competition Awards 3
- "Jingrun Cup" Mathematics Competition (Professional Category), First Prize (September 2021)
- Level: Campus-level competition named after renowned mathematician Chen Jingrun
- Ranking: University First Place
- Reference: ๐ Competition News
- "Internet+" Innovation and Entrepreneurship Competition, Gold Medal (August 2021)
- Level: Provincial level competition
- Region: Fujian Province
- "Challenge Cup" National Undergraduate Academic Science and Technology Competition, First Prize (May 2021)
- Level: Provincial level competition
- Region: Fujian Province
๐ฌ Talks, Skills & Service
๐ค Invited Talks 2
๐ป Technical Skills
๐ Professional Service
๐ฏ Hobbies & Interests
Feel free to hit me up for karaoke, board games, badminton, hiking, or just grabbing food and chatting about life!
๐ฌ Let's Connect
Get in Touch
๐ซ Email: cyd0806@mail.ustc.edu.cn
๐ผ I'm eager to connect with fellow deep learning enthusiasts and researchers passionate about advancing AI.
๐ USTC Gaoxin campus, Hefei, Anhui, China