I am Yinda Chen (้™ˆ่ƒค่พพ), a Ph.D. candidate jointly trained by the University of Science and Technology of China (USTC) and the Shanghai AI Lab, majoring in Information and Communication Engineering.

I know my talents may not match those of many brilliant pioneers in this field, but I believe in the power of continuous learning and genuine collaboration. Iโ€™m always eager to exchange ideas, seek advice from the community, and help those who are just starting their research journey. Please feel free to reach outโ€”Iโ€™d be truly honored to connect, collaborate, or simply share thoughts about research! ๐ŸŒŸ

๐ŸŽฏ Research Areas


๐Ÿ”ฅ News

  • 2026.01 ๐ŸŽ“ Awarded the Zenghua Scholarship (ๅขžๅŽๅฅ–ๅญฆ้‡‘) by USTC, a university-level scholarship.
  • 2026.01 ๐Ÿ“„ Two papers were accepted by ICLR 2026. See you in Rio de Janeiro, Brazil!
  • 2026.01 โœˆ๏ธ Traveling to Singapore for AAAI 2026. Welcome friends interested in embodied intelligence, world models, and medical imaging to connect!
  • 2026.01 ๐Ÿ“„ One paper on medical image registration was accepted by ICASSP 2026. See you in Barcelona, Spain in May!
  • 2025.12 ๐Ÿ’ผ Joined Beijing Humanoid Robot Innovation Center (UBTECH Robotics) as a core member, focusing on embodied intelligence world models for humanoid robots.
  • 2025.11 ๐ŸŽ‰ My advisor Prof. Feng Wu (ๅดๆžซ) was elected as an Academician of the Chinese Academy of Engineering (CAE). A role model forever.
  • 2025.11 ๐Ÿ“„ One paper was accepted by AAAI 2026.
  • 2025.11 ๐Ÿ“„ One paper was accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology).
  • 2025.10 ๐Ÿ“„ One paper was accepted by JBHI (IEEE Journal of Biomedical and Health Informatics).
  • 2025.08 ๐Ÿ’ผ Started as Qingyun intern at Tencent IEG, working on game video scene understanding.
  • 2025.06 ๐Ÿ“„ One paper was accepted by ICCV 2025.
  • 2025.05 ๐Ÿ“„ One paper was accepted by ACL 2025 findings.
  • 2025.05 ๐Ÿ“„ One paper was accepted by ICML 2025.
  • 2025.01 ๐Ÿ“„ One paper was selected as oral by AAAI 2025.
  • 2024.12 ๐Ÿ… Successfully selected as the principal investigator of the Ph.D. Natural Science Foundation Project.
  • 2024.10 ๐Ÿ“„ One paper was accepted by NeurIPS 2024.


๐Ÿ“– Education

๐ŸŽ“ Academic Background 3
USTC Logo

University of Science and Technology of China & Shanghai AI Lab

Ph.D. in Information and Communication Engineering (Expected 2027)

๐Ÿ“ Hefei & Shanghai ยท Sept 2024 - Present

  • Research focus: Machine Learning Theory, Self-Supervised Pretraining, Multimodal Large Models, Image Coding & Compression
  • Working with Prof. Feng Wu (CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong
  • Co-supervised by Prof. Xiaoou Tang (IEEE Fellow) at Shanghai AI Lab
  • Selected coursework: Algorithm Design and Analysis, Statistical Learning, Deep Learning, Reinforcement Learning
  • Principal Investigator for NSFC Ph.D. Project (2024)
USTC Logo

University of Science and Technology of China

M.S. in Computer Technology

๐Ÿ“ Hefei ยท Sept 2022 - July 2024

  • Recipient of National Graduate Scholarship (2022)
XMU Logo

Xiamen University

B.S. in Environmental Ecological Engineering & Economics (Dual Degree)

๐Ÿ“ Xiamen ยท Sept 2018 - July 2022


๐Ÿ’ผ Professional Experience

๐Ÿข Industry Experience 3
Beijing Humanoid Robot Innovation Center Logo

Beijing Humanoid Robot Innovation Center (UBTECH Robotics)

Core Member, Embodied Intelligence World Model Algorithm Team

๐Ÿ“ Beijing ยท Dec 2025 - Present

  • Conducting research on embodied intelligence world models
  • Developing advanced algorithms for humanoid robot perception and decision-making
Tencent Logo

Tencent Interactive Entertainment Group (IEG)

Qingyun Program Intern

๐Ÿ“ Shanghai ยท Aug 2025 - Dec 2025

  • Developing TTS (Text-to-Speech) systems and multimodal scene understanding for Honor of Kings โ€“ Lingbao and League of Legends highlight commentary
  • Supervised by Dr. Liang Du and Dr. Wentao Yao
  • Designed TTS models for generating game commentary and character voiceovers, exploring generation-understanding synergy
  • Developed multimodal algorithms to analyze game video content and automatically generate contextual commentary scripts
  • Co-developed MAIN-VLA: A game world model for Game for Peace (ๅ’Œๅนณ็ฒพ่‹ฑ), modeling abstraction of intention and environment for Visual-Language-Action in complex gaming scenarios
301 Hospital Logo

Chinese PLA General Hospital (301 Hospital)

Research Intern, Data Compression Group

๐Ÿ“ Beijing ยท Sept 2023 - Feb 2024

  • Collaborated with Prof. Qionghai Dai (CAE Academician, IEEE Fellow)'s team on efficient data compression research
  • Designed image-specific compression algorithms for various data modalities
  • Achieved 35% improvement in compression efficiency
๐ŸŽ“ Academic Experience 3
Imperial College Logo

Imperial College London

Research Intern, Data Science Institute

๐Ÿ“ London (Remote) ยท Nov 2022 - Aug 2023

  • Collaborated with Dr. Rossella Arcucci and Dr. Che Liu on multimodal pretraining research
  • Developed image-text contrastive learning framework achieving 93.5% accuracy on classification tasks
  • Submitted one journal paper
XMU Logo

Xiamen University WISER Club

Insider, Data Mining Group

๐Ÿ“ Xiamen ยท Aug 2021 - July 2022

  • Designed and led data mining courses, delivering lectures on clustering and Transformer architectures
  • Mentored 20 undergraduate students in machine learning projects and organized 2 campus-wide competitions
  • Media coverage
XMU Logo

Wang Yanan Institute for Studies in Economics, Xiamen University

Research Assistant, Econometrics

๐Ÿ“ Xiamen ยท Aug 2020 - Dec 2021

  • Assisted Associate Prof. Jiong Zhu in national land economic statistics research
  • Conducted visual feature extraction for homestead information and land use analysis
  • Developed satellite imagery analysis tools achieving 85% accuracy in identifying land use changes


๐Ÿ“ Selected Publications

For a complete list of publications, please visit my Google Scholar profile

๐Ÿ“ˆ View Citation Trend
Citation Trend

Note: * denotes equal contribution

๐Ÿง  Self-Supervised Learning & Pretraining 4
ICCV 2025
CCF A
TokenUnify

TokenUnify: Scaling Up Autoregressive Pretraining for Computer Vision Computer VisionSelf-Supervised Learning
ICCV | October 25, 2025
Yinda Chen*; Haoyuan Shi*; Xiaoyu Liu; Te Shi; Ruobing Zhang; Dong Liu; Zhiwei Xiong; Feng Wu

Code Dataset Weights

TokenUnify proposes a hierarchical predictive coding framework for computer vision, reducing autoregressive error from O(K) to O(โˆšK). It introduces a dataset with 1.2 billion annotated voxels and achieves 44% improvement over training from scratch.

ICML 2025
CCF A
MaskTwins

MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation Domain AdaptationPretraining Methods
ICML | July 13, 2025
Jiawen Wang; Yinda Chen* (Theory Contribution & Project Leader); Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong

Code Poster

MaskTwins introduces a dual-form complementary masking strategy for domain-adaptive image segmentation, effectively bridging the domain gap through coordinated spatial and feature-level masking mechanisms.

IJCAI 2023
CCF A
dbMiM

Self-Supervised Computer Vision with Multi-Agent Reinforcement Learning Computer VisionSelf-Supervised Learning
IJCAI (oral) | August 17, 2023
Yinda Chen; Wei Huang; Shenglong Zhou; Qi Chen; Zhiwei Xiong

Code Pretrain Data CREMI VNC

This paper proposes a decision-based MIM for computer vision segmentation. It uses MARL to optimize masking, outperforming alternatives.

ICASSP 2024
CCF B
MS-Con

Learning multiscale consistency for self-supervised electron microscopy instance segmentation Computer VisionPretraining Methods
ICASSP | April 13, 2024
Yinda Chen; Wei Huang; Xiaoyu Liu; Shiyu Deng; Qi Chen; Zhiwei Xiong

Code

A pretraining framework for volume instance segmentation is proposed. It enforces multiscale consistency and shows good performance in instance segmentation tasks.

๐Ÿฅ Medical Image Analysis & Vision-Language 4
ICCV Workshop 2025
GTGM

GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image Segmentation Vision-LanguageMedical Imaging
ICCV Workshop | October 25, 2025
Yinda Chen*; Che Liu*; Wei Huang; Xiaoyu Liu; Haoyuan Shi; Sibo Cheng; Rossella Arcucci; Zhiwei Xiong

Code

GTGM extends Vision-Language Pretraining to 3D medical images by leveraging LLMs to generate synthetic textual descriptions, enabling text-guided representation learning without paired medical text. Combined with a negative-free contrastive learning strategy, GTGM achieves state-of-the-art performance across 10 CT/MRI segmentation datasets.

IEEE JBHI
SCI Q1 | IF: 6.7
EMPOWER

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning Vision-LanguageMultimodal Learning
IEEE Journal of Biomedical and Health Informatics | October 16, 2025
Yinda Chen*; Yangfan He*; Jing Yang; Dapeng Zhang; Zhenlong Yuan; Muhammad Attique Khan; Jamel Baili; Por Lip Yee

EMPOWER proposes an evolutionary framework for prompt optimization through specialized representation learning and multi-dimensional evaluation. It achieves 24.7% reduction in factual errors and 15.3% higher preference scores.

MICCAI 2024
CCF B
BIMCV-R

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval Vision-LanguageMultimodal Learning
MICCAI | October 06, 2024
Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong

Dataset

This paper presents BIMCV-R, a 3D CT text-image retrieval dataset, and MedFinder. Tests show MedFinder outperforms baselines in related tasks.

IEEE TMI
SCI Q1 | IF: 10.6
DADn

Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks Domain AdaptationImage Denoising
IEEE Transactions on Medical Imaging | July 29, 2024
Shiyu Deng; Yinda Chen; Wei Huang; Ruobing Zhang; Zhiwei Xiong

Code

The paper proposes an unsupervised domain adaptation method for EM image denoising with invertible networks, outperforming existing methods.

๐Ÿ“ฆ Image Compression 2
IEEE TPAMI
SCI Q1 | IF: 20.8
GRCL Framework

Learned Image Coding with Generative Reference of Conditional Latents Image CompressionGenerative Models
IEEE Transactions on Pattern Analysis and Machine Intelligence | Accepted, 2025
Siqi Wu*; Yinda Chen*; Weiming Chen; Dong Liu; K. C. Ho; Zhihai He

Code

GRCL presents a generic framework that exploits semantically correlated external images as conditional coding references in the latent domain. Three reference generation methods are investigated: local dictionary retrieval, web-based image search, and diffusion-based image-text-image generation. Theoretical analysis proves robustness to reference perturbations via subspace recovery error bounds. Achieves up to 1.5 dB PSNR gain over state-of-the-art methods with only ~0.005 bpp overhead.

AAAI 2025
CCF A
CLC

Condition-generation Latent Coding with an External Dictionary for Deep Image Compression Image Compression
AAAI (oral) | March 06, 2025
Siqi Wu; Yinda Chen*; Dong Liu; Zhihai He

Code Weights

The paper proposes CLC for deep image compression. It uses a dictionary to generate references, shows good performance, and has theoretical analysis.

๐ŸŽจ Image Segmentation & Synthesis 1
NeurIPS 2024
CCF A
MaskFactory

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation Multimodal Learning
NeurIPS | October 17, 2024
Haotian Qian; Yinda Chen*; Shengtao Lou; Fahad Shahbaz Khan; Xiaogang Jin; Deng-Ping Fan

Project Code

MaskFactory proposes a two-stage method to generate high-quality synthetic datasets for DIS, outperforming existing methods in quality and efficiency.


๐Ÿฅ‡ Honors and Awards

๐Ÿ… National Awards & Research Funding 5
๐ŸŽ“ Academic Honors & Scholarships 2
๐Ÿ† Competition Awards 3


๐Ÿ’ฌ Talks, Skills & Service

๐ŸŽค Invited Talks 2
2023.03
Vision-Language Pretraining Using Generative Methods JD AI Team โ€” Talk on leveraging generative techniques in VLMs for pretraining
2021.08
Data Mining Course โ€” Clustering and Feature Extraction Xiamen University, WISER Club โ€” Designed and delivered data mining lectures
๐Ÿ’ป Technical Skills
๐Ÿ’ป Programming
Python C C++ Java MATLAB LaTeX Mathematica
๐Ÿง  Deep Learning
PyTorch TensorFlow DeepSpeed DDP
๐Ÿ“Š Data & Analysis
Pandas NumPy
๐ŸŒ Web Development
HTML CSS JavaScript Vue
๐Ÿ› ๏ธ Tools & Infrastructure
Git Docker CUDA HPC
๐ŸŒ Languages
English (TOEFL 110, GRE 328) Chinese (Native)
๐ŸŽ“ Professional Service
Computer Vision & Multimedia
CVPR 2025 ICCV 2025 WACV 2026 MICCAI 2025 ACM MM 2024
Machine Learning & AI
NeurIPS 2024 ICML 2025 ICLR 2024 AAAI AISTATS 2024
Journals
IJCV TIP


๐ŸŽฏ Hobbies & Interests

๐ŸŽค Singing
Love belting out tunes and exploring different music styles
๐Ÿณ Cooking
Always experimenting with new recipes โ€” my kitchen is my lab!
๐Ÿธ Sports
Into badminton, basketball, and table tennis. Also hitting the gym as a total newbie!
๐ŸŽฒ Board Games
Obsessed with Splendor, Catan, Ticket to Ride, Azul, 7 Wonders, Carcassonne, and Wingspan. Game night anyone?
โ›ฐ๏ธ Hiking & Traveling
Love chasing sunrises on mountain peaks. Explored Huangshan, Jiuhuashan, Zhuhai, Changsha, Istanbul, Morocco's Sahara, Seoul, and Singapore.
๐ŸŽฎ Gaming
Not just building Honor of Kings (Lingbao developer!) but rocking National Server Nezha, Golden Badge Nakoruru, and ranked Top 50 Jungler in Hefei.

Feel free to hit me up for karaoke, board games, badminton, hiking, or just grabbing food and chatting about life!

๐Ÿ“ฌ Let's Connect

Get in Touch

๐Ÿ“ซ Email: cyd0806@mail.ustc.edu.cn

๐Ÿ’ผ I'm eager to connect with fellow deep learning enthusiasts and researchers passionate about advancing AI.

๐Ÿ“ USTC Gaoxin campus, Hefei, Anhui, China

Visitor Map ๐ŸŒ