I am Yinda Chen (陈胤达), a Ph.D. candidate jointly trained by the University of Science and Technology of China (USTC) and the Shanghai AI Lab, majoring in Information and Communication Engineering.

  • 👀 My research focuses on multimodal understanding and generation. Initially, I explored these two directions separately—working on representation learning, self-supervised pretraining, and efficient fine-tuning in computer vision and other multimodal tasks. Now, I’m gradually moving toward a unified framework that bridges understanding and generation across various domains.

  • 🧭 I have the immense privilege of being advised by Prof. Feng Wu (吴枫, CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong (熊志伟) at USTC, and working under the collaborative supervision of Prof. Xiaoou Tang (汤晓鸥, IEEE Fellow) at the Shanghai AI Lab. I also have the honor to collaborate closely with Prof. Dong Liu (刘东), Prof. Li Li (李礼), and Prof. Zhihai He (何志海, IEEE Fellow). Additionally, I’ve been fortunate to conduct research internships at Imperial College London with Dr. Rossella Arcucci and Dr. Che Liu (刘澈), and the PLA General Hospital (301 Hospital) with Prof. Qionghai Dai (戴琼海, CAE Academician, IEEE Fellow).

  • 🌱 Currently, I’m a core member at the Beijing Humanoid Robot Innovation Center (Tiangong Robot), focusing on embodied intelligence world models and unified models for humanoid robots. Previously, I was an intern at Tencent IEG (Interactive Entertainment Group) through the Qingyun Program, working on TTS (Text-to-Speech) generation and multimodal understanding for Honor of Kings – Lingbao projects under the guidance of Dr. Liang Du (杜量) and Dr. Wentao Yao (姚文韬).

  • 💞️ Coming from a non-traditional background with dual bachelor’s degrees in Environmental Ecological Engineering and Economics at Xiamen University, I’ve always been fascinated by interdisciplinary research. During my undergraduate years, I had the privilege of being mentored by Prof. Yuanye Zhang (张原野), whose passion and rigor guided me into my initial research endeavors. My journey into AI began in 2021 as an active member of WISERCLUB, a data science community that shaped my early curiosity—I’m deeply grateful to my good friend Ziqian Lin (林子谦) (now at Peking University Guanghua School of Management, Department of Statistics) for recommending me to this amazing community. As someone who transitioned across fields, I deeply understand the challenges of learning something entirely new, and this experience has made me incredibly grateful for every mentor and collaborator who has guided me along the way.

  • 📝 Thanks to the generous guidance of my mentors and the support of amazing collaborators, I’ve been fortunate to publish papers at conferences including AAAI, NeurIPS, MICCAI, ICCV, and IJCAI.

  • 📄 My CV is available in Chinese and English.

I know my talents may not match those of many brilliant pioneers in this field, but I believe in the power of continuous learning and genuine collaboration. I’m always eager to exchange ideas, seek advice from the community, and help those who are just starting their research journey. Please feel free to reach out—I’d be truly honored to connect, collaborate, or simply share thoughts about research! 🌟

🎯 Research Areas


🔥 News

  • 2025.12 Joined Beijing Humanoid Robot Innovation Center (Tiangong Robot) as a core member of the Embodied Intelligence World Model Algorithm Team, focusing on embodied intelligence world models and unified models for humanoid robots.
  • 2025.11.21 My advisor Prof. Feng Wu (吴枫) was elected as an Academician of the Chinese Academy of Engineering (CAE). The CAE Academician is the highest academic honor in engineering and technology in China, representing the nation’s most prestigious recognition for outstanding contributions to engineering innovation and scientific advancement. I will always learn from his dedication to research and mentorship. A role model forever.
  • 2025.11.09 One paper was accepted by AAAI 2026.
  • 2025.11.05 One paper was accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology).
  • 2025.10.16 One paper was accepted by JBHI (IEEE Journal of Biomedical and Health Informatics).
  • 2025.08 Started as Qingyun intern at Tencent IEG, working on game video scene understanding.
  • 2025.06.26 One paper was accepted by ICCV 2025.
  • 2025.05.15 One paper was accepted by ACL 2025 findings.
  • 2025.05.01 One paper was accepted by ICML 2025.
  • 2025.01.18 One paper was selected as oral by AAAI 2025.
  • 2024.12.06 I was successfully selected as the principal investigator of the Ph.D. Natural Science Foundation Project.
  • 2024.10.10 One paper was accepted by NeurIPS 2024.

📖 Education

  • USTC Logo

    University of Science and Technology of China & Shanghai AI Lab

    Ph.D. in Information and Communication Engineering (Expected 2027)

    📍 Hefei & Shanghai · Sept 2024 - Present

    • Research focus: Machine Learning Theory, Self-Supervised Pretraining, Multimodal Large Models, Image Coding & Compression
    • Working with Prof. Feng Wu (CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong
    • Research collaboration with Prof. Xiaoou Tang (IEEE Fellow) at Shanghai AI Lab
    • Selected coursework: Algorithm Design and Analysis, Statistical Learning, Deep Learning, Reinforcement Learning
    • Principal Investigator for NSFC Ph.D. Project (2024)
  • USTC Logo

    University of Science and Technology of China

    M.S. in Computer Technology

    📍 Hefei · Sept 2022 - July 2024

    • Recipient of National Graduate Scholarship (2022)
  • XMU Logo

    Xiamen University

    B.S. in Environmental Ecological Engineering & Economics (Dual Degree)

    📍 Xiamen · Sept 2018 - July 2022


💼 Professional Experience

  • Beijing Humanoid Robot Innovation Center Logo

    Beijing Humanoid Robot Innovation Center (Tiangong Robot)

    Core Member, Embodied Intelligence World Model Algorithm Team

    📍 Beijing · Dec 2025 - Present

    • Conducting research on embodied intelligence world models
    • Developing advanced algorithms for humanoid robot perception and decision-making
  • Tencent Logo

    Tencent Interactive Entertainment Group (IEG)

    Qingyun Program Intern

    📍 Shanghai · Aug 2025 - Dec 2025

    • Developing TTS (Text-to-Speech) systems and multimodal scene understanding for Honor of Kings – Lingbao and League of Legends highlight commentary
    • Supervised by Dr. Liang Du and Dr. Wentao Yao
    • Designed TTS models for generating game commentary and character voiceovers, exploring generation-understanding synergy
    • Developed multimodal algorithms to analyze game video content and automatically generate contextual commentary scripts
  • 301 Hospital Logo

    Chinese PLA General Hospital (301 Hospital)

    Research Intern, Data Compression Group

    📍 Beijing · Sept 2023 - Feb 2024

    • Collaborated with Prof. Qionghai Dai (CAE Academician, IEEE Fellow)’s team on efficient data compression research
    • Designed image-specific compression algorithms for various data modalities
    • Achieved 35% improvement in compression efficiency
  • Imperial College Logo

    Imperial College London

    Research Intern, Data Science Institute

    📍 London (Remote) · Nov 2022 - Aug 2023

    • Collaborated with Dr. Rossella Arcucci and Dr. Che Liu on multimodal pretraining research
    • Developed image-text contrastive learning framework achieving 93.5% accuracy on classification tasks
    • Submitted one journal paper
  • XMU Logo

    Xiamen University WISER Club

    Insider, Data Mining Group

    📍 Xiamen · Aug 2021 - July 2022

    • Designed and led data mining courses, delivering lectures on clustering and Transformer architectures
    • Mentored 20 undergraduate students in machine learning projects and organized 2 campus-wide competitions
    • Media coverage
  • XMU Logo

    Wang Yanan Institute for Studies in Economics, Xiamen University

    Research Assistant, Econometrics

    📍 Xiamen · Aug 2020 - Dec 2021

    • Assisted Associate Prof. Jiong Zhu in national land economic statistics research
    • Conducted visual feature extraction for homestead information and land use analysis
    • Developed satellite imagery analysis tools achieving 85% accuracy in identifying land use changes


📝 Selected Publications

For a complete list of publications, please visit my Google Scholar profile

📈 View Citation Trend
Citation Trend

Note: * denotes equal contribution

📰 Journal Articles

IEEE JBHI
SCI Q1 | IF: 6.7
sym

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning Vision-LanguageMultimodal Learning
IEEE Journal of Biomedical and Health Informatics | October 16, 2025
Yinda Chen*; Yangfan He*; Jing Yang; Dapeng Zhang; Zhenlong Yuan; Muhammad Attique Khan; Jamel Baili; Por Lip Yee

EMPOWER proposes an evolutionary framework for prompt optimization through specialized representation learning and multi-dimensional evaluation. It achieves 24.7% reduction in factual errors and 15.3% higher preference scores.

IEEE TMI
SCI Q1 | IF: 10.6
sym

Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks Domain AdaptationImage Denoising
IEEE Transactions on Medical Imaging | July 29, 2024
Shiyu Deng; Yinda Chen; Wei Huang; Ruobing Zhang; Zhiwei Xiong

Code

The paper proposes an unsupervised domain adaptation method for EM image denoising with invertible networks, outperforming existing methods.

🎓 Conference Papers

ICCV Workshop 2025
GTGM

GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image Segmentation Vision-LanguageMedical Imaging
ICCV Workshop | October 25, 2025
Yinda Chen*; Che Liu*; Wei Huang; Xiaoyu Liu; Haoyuan Shi; Sibo Cheng; Rossella Arcucci; Zhiwei Xiong

Code

GTGM extends Vision-Language Pretraining to 3D medical images by leveraging LLMs to generate synthetic textual descriptions, enabling text-guided representation learning without paired medical text. Combined with a negative-free contrastive learning strategy, GTGM achieves state-of-the-art performance across 10 CT/MRI segmentation datasets.

ICCV 2025
CCF A
sym

TokenUnify: Scaling Up Autoregressive Pretraining for Computer Vision Computer VisionSelf-Supervised Learning
ICCV | October 25, 2025
Yinda Chen*; Haoyuan Shi*; Xiaoyu Liu; Te Shi; Ruobing Zhang; Dong Liu; Zhiwei Xiong; Feng Wu

Code Dataset Weights

TokenUnify proposes a hierarchical predictive coding framework for computer vision, reducing autoregressive error from O(K) to O(√K). It introduces a dataset with 1.2 billion annotated voxels and achieves 44% improvement over training from scratch.

ICML 2025
CCF A
sym

MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation Domain AdaptationPretraining Methods
ICML | July 13, 2025
Jiawen Wang; Yinda Chen* (Theory Contribution & Project Leader); Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong

Code Poster

MaskTwins introduces a dual-form complementary masking strategy for domain-adaptive image segmentation, effectively bridging the domain gap through coordinated spatial and feature-level masking mechanisms.

AAAI 2025
CCF A
sym

Condition-generation Latent Coding with an External Dictionary for Deep Image Compression Image Compression
AAAI (oral) | March 06, 2025
Siqi Wu; Yinda Chen*; Dong Liu; Zhihai He

Code Weights

The paper proposes CLC for deep image compression. It uses a dictionary to generate references, shows good performance, and has theoretical analysis.

NeurIPS 2024
CCF A
sym

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation Multimodal Learning
NeurIPS | October 17, 2024
Haotian Qian; Yinda Chen*; Shengtao Lou; Fahad Shahbaz Khan; Xiaogang Jin; Deng-Ping Fan

Project Code

MaskFactory proposes a two-stage method to generate high-quality synthetic datasets for DIS, outperforming existing methods in quality and efficiency.

MICCAI 2024
CCF B
sym

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval Vision-LanguageMultimodal Learning
MICCAI | October 06, 2024
Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong

Dataset

This paper presents BIMCV-R, a 3D CT text-image retrieval dataset, and MedFinder. Tests show MedFinder outperforms baselines in related tasks.

ICASSP 2024
CCF B
sym

Learning multiscale consistency for self-supervised electron microscopy instance segmentation Computer VisionPretraining Methods
ICASSP | April 13, 2024
Yinda Chen; Wei Huang; Xiaoyu Liu; Shiyu Deng; Qi Chen; Zhiwei Xiong

Code

A pretraining framework for volume instance segmentation is proposed. It enforces multiscale consistency and shows good performance in instance segmentation tasks.

IJCAI 2023
CCF A
sym

Self-Supervised Computer Vision with Multi-Agent Reinforcement Learning Computer VisionSelf-Supervised Learning
IJCAI (oral) | August 17, 2023
Yinda Chen; Wei Huang; Shenglong Zhou; Qi Chen; Zhiwei Xiong

Code Pretrain Data CREMI VNC

This paper proposes a decision-based MIM for computer vision segmentation. It uses MARL to optimize masking, outperforming alternatives.


🥇 Honors and Awards


💬 Talks and Teaching

  • 2023.03, JD AI Team, Talk on Vision-Language Pretraining Using Generative Methods
    Delivered a talk on the application of generative methods in vision-language models (VLMs) for pretraining. The presentation focused on leveraging advanced generative techniques to enhance performance in AI tasks
  • 2021.08, XMU Wiserclub–Data Mining Xiamen University, WISER Club Responsible for designing and discussing the Data Mining course. Delivered lectures on Clustering and Feature Extraction topics.

💻 Skills

  • 💻 Programming Languages: Python, C++, MATLAB, Mathmatica
  • 🧠 Deep Learning Frameworks: TensorFlow, PyTorch
  • 📊 Data Analysis: Pandas, NumPy
  • 🌐 Web Development: HTML, CSS, JavaScript, Vue
  • 🛠️ Tools: Git, Docker

🎓 Professional Service

I have served as a reviewer for prestigious conferences and journals, including:

  • Computer Vision: CVPR, ICCV, WACV
  • Machine Learning & AI: NeurIPS, ICML, ICLR, AAAI, AISTATS
  • Computer Vision & Multimedia: CVPR, ICCV, WACV, MICCAI, ACM MM
  • Journals: IJCV, TIP

🎯 Hobbies & Interests

  • 🎤 Singing: Love belting out tunes and exploring different music styles
  • 🍳 Cooking: Always experimenting with new recipes - my kitchen is my lab!
  • 🏸 Sports: Into badminton, basketball, and table tennis. Also hitting the gym as a total newbie (don’t judge!)
  • 🎲 Board Games: Obsessed with strategy games like Splendor (璀璨宝石), Catan (卡坦岛), Ticket to Ride (车票之旅), Azul (花砖物语), 7 Wonders (世界七大奇迹), Carcassonne (卡卡颂), and Wingspan (展翅翱翔). Game night anyone?
  • ⛰️ Hiking & Traveling: Love chasing sunrises on mountain peaks and getting lost in new cities. Recently explored the misty heights of Huangshan, the sacred temples of Jiuhuashan, wandered through Zhuhai’s coastlines and Changsha’s food scene. Been lucky enough to haggle in Istanbul’s bazaars, ride camels in Morocco’s Sahara, and get caffeinated in Seoul’s trendy streets
  • 🎮 Gaming: Not just building Honor of Kings (Lingbao Project developer here!) but also crushing it as a player - rocking National Server Nezha, Golden Badge Nakoruru, and ranked Top 50 Jungler in Hefei. Let’s duo queue sometime!

Feel free to hit me up for karaoke sessions, board games, badminton matches, hiking trips, or just grabbing food and chatting about life! Always down to meet cool people and try new things 🎉

📬 Let’s Connect

  • 📫 You can reach out to me via email at cyd0806@mail.ustc.edu.cn.
  • 💼 I’m eager to connect with fellow deep learning enthusiasts and graduate researchers who share similar interests and are passionate about advancing the frontiers of AI in these domains.
  • 📍 USTC Gaoxin campus, Hefei, Anhui, China

Feel free to drop me an email to discuss potential collaborations, share your ideas, or just have a friendly chat!

Visitor Map 🌍