I am Yinda Chen (陈胤达), a Ph.D. candidate jointly trained by the University of Science and Technology of China (USTC) and the Shanghai AI Lab, majoring in Information and Communication Engineering.

👀 My research develops predictive multimodal intelligence: from representation and predictive learning to world models for embodied perception, future-state prediction, and action. Recent work in Pelican-Unified and its Pelican-VLA 0.5 module connects unified multimodal representations, future imagination, and closed-loop robot control.
🧭 I am advised by Prof. Feng Wu (吴枫, CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong (熊志伟) at USTC, and co-supervised by Prof. Xiaoou Tang (汤晓鸥, IEEE Fellow) at the Shanghai AI Lab.

More about my background and experience

🤝 I have the honor to collaborate closely with Prof. Dong Liu (刘东), Prof. Li Li (李礼), and Prof. Zhihai He (何志海, IEEE Fellow). I have also conducted research internships at Imperial College London with Dr. Rossella Arcucci and Dr. Che Liu (刘澈), and the PLA General Hospital (301 Hospital) with Prof. Qionghai Dai (戴琼海, CAE Academician, IEEE Fellow).
🌱 Currently, I’m a core member at the Beijing Humanoid Robot Innovation Center (UBTECH Robotics), focusing on embodied intelligence world models and unified models for humanoid robots under the mentorship of Dr. Yong Dai (代勇). I also have research experience with Ubiquant (九坤投资), focusing on quantitative research and large language model foundation training, with exposure to thousand-GPU-scale clusters. Previously, I was an intern at Tencent IEG (Interactive Entertainment Group) through the Qingyun Program, where I worked on game world models and helped deliver Honor of Kings – Lingbao, focusing on TTS (Text-to-Speech) and real-time match commentary under the guidance of Dr. Liang Du (杜量) and Dr. Wentao Yao (姚文韬).
💞️ Coming from a non-traditional background with dual bachelor’s degrees in Environmental Ecological Engineering and Economics at Xiamen University, I’ve always been fascinated by interdisciplinary research. During my undergraduate years, I had the privilege of being mentored by Prof. Yuanye Zhang (张原野), whose passion and rigor guided me into my initial research endeavors. My journey into AI began in 2021 as an active member of WISERCLUB, a data science community that shaped my early curiosity—I’m deeply grateful to my good friend Ziqian Lin (林子谦) (now at Peking University Guanghua School of Management, Department of Statistics) for recommending me to this amazing community. As someone who transitioned across fields, I deeply understand the challenges of learning something entirely new, and this experience has made me incredibly grateful for every mentor and collaborator who has guided me along the way.
📝 Thanks to the generous guidance of my mentors and the support of amazing collaborators, I’ve been fortunate to publish papers at conferences including ICLR, ICML, ICCV, NeurIPS, AAAI, ACL, IJCAI, and MICCAI, and journals including TPAMI, TMI, and JBHI.
📄 My CV is available in Chinese and English.

I know my talents may not match those of many brilliant pioneers in this field, but I believe in the power of continuous learning and genuine collaboration. I’m always eager to exchange ideas, seek advice from the community, and help those who are just starting their research journey. Please feel free to reach out—I’d be truly honored to connect, collaborate, or simply share thoughts about research! 🌟

🎯 Research Areas

🔥 News

2026.07 🤖 Pelican-VLA 0.5: Attending Before Acting Benefits Generalization technical report released. As the VLA component within Pelican-Unified, it uses learnable bottleneck tokens to connect task-relevant perception, future-frame generation, and action generalization.
2026.06 📄 One paper was accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology).
2026.05 📄 As a core contributor (ranked second), contributed to Pelican-Unified 1.0: unifying understanding, reasoning, imagination, and action (统一理解、推理、想象与动作).
2026.05 📄 Three papers were accepted by ICML 2026. See you in Seoul, South Korea!
2026.04 🎵 Earned the Singing Popular Music certification from Berklee College of Music.
2026.01 🎓 Awarded the Zenghua Scholarship (增华奖学金) by USTC, a university-level scholarship.
2026.01 📄 Two papers were accepted by ICLR 2026. See you in Rio de Janeiro, Brazil!
2026.01 ✈️ Traveling to Singapore for AAAI 2026. Welcome friends interested in embodied intelligence, world models, and medical imaging to connect!
2026.01 📄 One paper on medical image registration was accepted by ICASSP 2026. See you in Barcelona, Spain in May!

2025.12 💼 Joined Beijing Humanoid Robot Innovation Center (UBTECH Robotics) as a core member, focusing on embodied intelligence world models for humanoid robots.
2025.11 🎉 My advisor Prof. Feng Wu (吴枫) was elected as an Academician of the Chinese Academy of Engineering (CAE). A role model forever.
2025.11 📄 One paper was accepted by AAAI 2026.
2025.11 📄 One paper was accepted by TCSVT (IEEE Transactions on Circuits and Systems for Video Technology).
2025.10 📄 One paper was accepted by JBHI (IEEE Journal of Biomedical and Health Informatics).
2025.08 💼 Started as Qingyun intern at Tencent IEG, working on game video scene understanding.
2025.06 📄 One paper was accepted by ICCV 2025.
2025.05 📄 One paper was accepted by ACL 2025 findings.
2025.05 📄 One paper was accepted by ICML 2025.
2025.01 📄 One paper was selected as oral by AAAI 2025.
2024.12 🏅 Successfully selected as the principal investigator of the Ph.D. Natural Science Foundation Project.
2024.10 📄 One paper was accepted by NeurIPS 2024.

📖 Education

🎓 Academic Background 3

University of Science and Technology of China & Shanghai AI Lab

Ph.D. in Information and Communication Engineering (Expected 2027)

📍 Hefei & Shanghai · Sept 2024 - Present

Research focus: Machine Learning Theory, Self-Supervised Pretraining, Multimodal Large Models, Image Coding & Compression
Working with Prof. Feng Wu (CAE Academician, IEEE Fellow) and Prof. Zhiwei Xiong
Co-supervised by Prof. Xiaoou Tang (IEEE Fellow) at Shanghai AI Lab
Selected coursework: Algorithm Design and Analysis, Statistical Learning, Deep Learning, Reinforcement Learning
Principal Investigator for NSFC Ph.D. Project (2024)

University of Science and Technology of China

M.S. in Computer Technology

📍 Hefei · Sept 2022 - July 2024

Recipient of National Graduate Scholarship (2022)

Xiamen University

B.S. in Environmental Ecological Engineering & Economics (Dual Degree)

📍 Xiamen · Sept 2018 - July 2022

Academic ranking: 1st/31 overall
Xiamen University Academic Star (2021), CDA Level 1 Certification (2022), Kaggle Expert
Research advisor: Prof. Yuanye Zhang

💼 Professional Experience

🏢 Industry Experience 4

Beijing Humanoid Robot Innovation Center (UBTECH Robotics)

Core Member, Embodied Intelligence World Model Algorithm Team

📍 Beijing · Dec 2025 - Present

Developing unified embodied world models for humanoid robots, connecting task-relevant perception, future-state prediction, and low-level action generation in real-robot closed loops
Contributing to Unify pretraining and planning validation, including the VLA module's bottleneck representations for cross-scene and cross-embodiment manipulation generalization
Contributed as a core contributor (ranked second) to Pelican-Unified 1.0; participated in the Pelican-VLA 0.5 technical report on its VLA component

Ubiquant (九坤投资)

Researcher, Quantitative Research and Large Language Model Foundation Training

📍 Beijing · May 2026 - Present

Working on quantitative research and large language model foundation training
Experienced with thousand-GPU-scale clusters and exploring LLM applications in factor mining and data-driven decision making

Tencent Interactive Entertainment Group (IEG)

Qingyun Program Intern

📍 Shanghai · Aug 2025 - Dec 2025

Developing TTS (Text-to-Speech) systems and multimodal scene understanding for Honor of Kings – Lingbao and League of Legends highlight commentary
Supervised by Dr. Liang Du and Dr. Wentao Yao
Designed TTS models for generating game commentary and character voiceovers, exploring generation-understanding synergy
Developed multimodal algorithms to analyze game video content and automatically generate contextual commentary scripts
Co-developed MAIN-VLA: A game world model for Game for Peace (和平精英), modeling abstraction of intention and environment for Visual-Language-Action in complex gaming scenarios

Chinese PLA General Hospital (301 Hospital)

Research Intern, Data Compression Group

📍 Beijing · Sept 2023 - Feb 2024

Collaborated with Prof. Qionghai Dai (CAE Academician, IEEE Fellow)'s team on efficient data compression research
Designed image-specific compression algorithms for various data modalities
Achieved 35% improvement in compression efficiency

🎓 Academic Experience 3

Imperial College London

Research Intern, Data Science Institute

📍 London (Remote) · Nov 2022 - Aug 2023

Collaborated with Dr. Rossella Arcucci and Dr. Che Liu on multimodal pretraining research
Developed image-text contrastive learning framework achieving 93.5% accuracy on classification tasks
Submitted one journal paper

Xiamen University WISER Club

Insider, Data Mining Group

📍 Xiamen · Aug 2021 - July 2022

Designed and led data mining courses, delivering lectures on clustering and Transformer architectures
Mentored 20 undergraduate students in machine learning projects and organized 2 campus-wide competitions
Media coverage

Wang Yanan Institute for Studies in Economics, Xiamen University

Research Assistant, Econometrics

📍 Xiamen · Aug 2020 - Dec 2021

Assisted Associate Prof. Jiong Zhu in national land economic statistics research
Conducted visual feature extraction for homestead information and land use analysis
Developed satellite imagery analysis tools achieving 85% accuracy in identifying land use changes

📝 Selected Research

For a complete list of publications, please visit my Google Scholar profile

📈 View Citation Trend

Note: * denotes equal contribution

🧠 Representation and Predictive Learning 4

ICCV 2025

CCF A

TokenUnify: Scaling Up Autoregressive Pretraining for Computer Vision Computer VisionSelf-Supervised Learning
ICCV | October 25, 2025
Yinda Chen*; Haoyuan Shi*; Xiaoyu Liu; Te Shi; Ruobing Zhang; Dong Liu; Zhiwei Xiong; Feng Wu

Code

Dataset

Weights

TokenUnify proposes a hierarchical predictive coding framework for computer vision, reducing autoregressive error from O(K) to O(√K). It introduces a dataset with 1.2 billion annotated voxels and achieves 44% improvement over training from scratch.

ICML 2025

CCF A

MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation Domain AdaptationPretraining Methods
ICML | July 13, 2025
Jiawen Wang; Yinda Chen* (Theory Contribution & Project Leader); Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong

Code

Poster

MaskTwins introduces a dual-form complementary masking strategy for domain-adaptive image segmentation, effectively bridging the domain gap through coordinated spatial and feature-level masking mechanisms.

IJCAI 2023

CCF A

Self-Supervised Computer Vision with Multi-Agent Reinforcement Learning Computer VisionSelf-Supervised Learning
IJCAI (oral) | August 17, 2023
Yinda Chen; Wei Huang; Shenglong Zhou; Qi Chen; Zhiwei Xiong

Code

Pretrain Data

CREMI

VNC

This paper proposes a decision-based MIM for computer vision segmentation. It uses MARL to optimize masking, outperforming alternatives.

ICASSP 2024

CCF B

Learning multiscale consistency for self-supervised electron microscopy instance segmentation Computer VisionPretraining Methods
ICASSP | April 13, 2024
Yinda Chen; Wei Huang; Xiaoyu Liu; Shiyu Deng; Qi Chen; Zhiwei Xiong

Code

A pretraining framework for volume instance segmentation is proposed. It enforces multiscale consistency and shows good performance in instance segmentation tasks.

✨ Multimodal and Generative Models 7

IEEE TPAMI

SCI Q1 | IF: 20.8

Learned Image Coding with Generative Reference of Conditional Latents Image CompressionGenerative Models
IEEE Transactions on Pattern Analysis and Machine Intelligence | Accepted, 2025
Siqi Wu*; Yinda Chen*; Weiming Chen; Dong Liu; K. C. Ho; Zhihai He

Code

GRCL presents a generic framework that exploits semantically correlated external images as conditional coding references in the latent domain. Three reference generation methods are investigated: local dictionary retrieval, web-based image search, and diffusion-based image-text-image generation. Theoretical analysis proves robustness to reference perturbations via subspace recovery error bounds. Achieves up to 1.5 dB PSNR gain over state-of-the-art methods with only ~0.005 bpp overhead.

AAAI 2025

CCF A

Condition-generation Latent Coding with an External Dictionary for Deep Image Compression Image Compression
AAAI (oral) | March 06, 2025
Siqi Wu; Yinda Chen*; Dong Liu; Zhihai He

Code

Weights

The paper proposes CLC for deep image compression. It uses a dictionary to generate references, shows good performance, and has theoretical analysis.

ICCV Workshop 2025

GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image Segmentation Vision-LanguageMedical Imaging
ICCV Workshop | October 25, 2025
Yinda Chen*; Che Liu*; Wei Huang; Xiaoyu Liu; Haoyuan Shi; Sibo Cheng; Rossella Arcucci; Zhiwei Xiong

Code

GTGM extends Vision-Language Pretraining to 3D medical images by leveraging LLMs to generate synthetic textual descriptions, enabling text-guided representation learning without paired medical text. Combined with a negative-free contrastive learning strategy, GTGM achieves state-of-the-art performance across 10 CT/MRI segmentation datasets.

MICCAI 2024

CCF B

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval Vision-LanguageMultimodal Learning
MICCAI | October 06, 2024
Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong

Dataset

This paper presents BIMCV-R, a 3D CT text-image retrieval dataset, and MedFinder. Tests show MedFinder outperforms baselines in related tasks.

NeurIPS 2024

CCF A

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation Multimodal Learning
NeurIPS | October 17, 2024
Haotian Qian; Yinda Chen*; Shengtao Lou; Fahad Shahbaz Khan; Xiaogang Jin; Deng-Ping Fan

Project

Code

MaskFactory proposes a two-stage method to generate high-quality synthetic datasets for DIS, outperforming existing methods in quality and efficiency.

IEEE TMI

SCI Q1 | IF: 10.6

Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks Domain AdaptationImage Denoising
IEEE Transactions on Medical Imaging | July 29, 2024
Shiyu Deng; Yinda Chen; Wei Huang; Ruobing Zhang; Zhiwei Xiong

Code

The paper proposes an unsupervised domain adaptation method for EM image denoising with invertible networks, outperforming existing methods.

IEEE JBHI

SCI Q1 | IF: 6.7

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning Vision-LanguageMultimodal Learning
IEEE Journal of Biomedical and Health Informatics | October 16, 2025
Yinda Chen*; Yangfan He*; Jing Yang; Dapeng Zhang; Zhenlong Yuan; Muhammad Attique Khan; Jamel Baili; Por Lip Yee

EMPOWER proposes an evolutionary framework for prompt optimization through specialized representation learning and multi-dimensional evaluation. It achieves 24.7% reduction in factual errors and 15.3% higher preference scores.

🤖 World Models and Embodied Intelligence 2

Technical Report 2026

Pelican-Unified 1.0: A Unified Embodied Intelligence Model (UEI) for Understanding, Reasoning, Imagination and Action Embodied IntelligenceWorld ModelsUnified Models
arXiv Technical Report | May 14, 2026
Beijing Innovation Center of Humanoid Robotics (X-Humanoid), WFM System Group; Yinda Chen (core contributor, ranked second)

PDF

arXiv

Hugging Face

Official

People’s Daily (人民日报)

Beijing Daily / Beijing Gov (北京日报 / 北京市政府)

Machine Heart (机器之心)

Zhidongxi (智东西)

Pelican-Unified 1.0, also reported as Pelican-Unify 1.0, unifies understanding, reasoning, future imagination, and action in one embodied intelligence loop. It uses a single VLM for scene/instruction understanding and task-oriented reasoning, plus a Unified Future Generator that jointly predicts future videos and actions in the same denoising process.

It ranks first on WorldArena with 66.03 EWM, reaches 98.12% 3D accuracy, achieves 93.5 average success on RoboTwin, and scores 64.7 across eight VLM benchmarks among comparable-scale models. The model has been validated on UR5e arms and the Tiangong humanoid robot for zero-shot long-horizon tasks such as interface insertion, waterproofing, and object manipulation.

Technical Report 2026

Pelican-VLA 0.5 attention visualizations

Pelican-VLA 0.5: Attending Before Acting Benefits Generalization Vision-Language-ActionEmbodied IntelligenceWorld Models
Technical Report | July 2026
Beijing Innovation Center of Humanoid Robotics (X-Humanoid), WFM System Group; Yinda Chen

Technical Report

PDF

Code

Models

As the concrete VLA component within the Pelican-Unified framework, Pelican-VLA 0.5 unifies vision-language understanding, future-frame generation, and action prediction. Learnable Bottleneck Tokens route task-relevant visual information to the action pathway, producing object- and contact-centric attention that generalizes across scenes and robot embodiments.

Together with Pelican-Unified 1.0, it forms a world-model narrative from unified multimodal representations, through future-state modeling, to closed-loop action.

🥇 Honors and Awards

🏅 National Awards & Research Funding 5

National Natural Science Foundation of China (NSFC) PhD Program (December 2024)
- Role: Principal Investigator
- Achievement: Sole awardee in Information Science, Anhui Province
- Description: Prestigious national research funding program for doctoral students
Interdisciplinary Contest in Modeling (ICM), Outstanding Winner (May 2024)
- Competition: International mathematical modeling competition organized by COMAP
- Ranking: Top 0.17% of 10,388 participating teams worldwide
- Resources: 📝 Paper & Code
National Graduate Scholarship (国家奖学金) (December 2022)
- Achievement: National award for academic performance and research contributions (Top 1%)
- Reference: 🔗 Official Announcement
National Undergraduate Mathematics Competition (Non-Major Category), Second Prize (May 2021)
- Competition: National finals organized by the Chinese Mathematical Society
- Progress: Advanced from provincial first place
- Reference: 🔗 Competition News
National Undergraduate Mathematics Competition (Non-Major Category), First Prize (November 2020)
- Competition: High-level national mathematics competition for undergraduate students
- Ranking: Provincial First Place, Fujian Province

🎓 Academic Honors & Scholarships 2

Zenghua Scholarship (增华奖学金) (January 2026)
- Level: University-level scholarship, USTC
- Achievement: Awarded for outstanding academic performance and research contributions
Xiamen University Academic Star (学术之星) (December 2021)
- Achievement: Sole undergraduate awardee university-wide
- Recognition: Outstanding academic achievements and research excellence
- Media: 🔗 University News | Feature Report

🏆 Competition Awards 3

"Jingrun Cup" Mathematics Competition (Professional Category), First Prize (September 2021)
- Level: Campus-level competition named after renowned mathematician Chen Jingrun
- Ranking: University First Place
- Reference: 🔗 Competition News
"Internet+" Innovation and Entrepreneurship Competition, Gold Medal (August 2021)
- Level: Provincial level competition
- Region: Fujian Province
"Challenge Cup" National Undergraduate Academic Science and Technology Competition, First Prize (May 2021)
- Level: Provincial level competition
- Region: Fujian Province

💬 Talks, Skills & Service

🎤 Invited Talks 2

2023.03

Vision-Language Pretraining Using Generative Methods JD AI Team — Talk on leveraging generative techniques in VLMs for pretraining

2021.08

Data Mining Course — Clustering and Feature Extraction Xiamen University, WISER Club — Designed and delivered data mining lectures

💻 Technical Skills

💻 Programming

Python C C++ Java MATLAB LaTeX Mathematica

🧠 Deep Learning

PyTorch TensorFlow DeepSpeed DDP

📊 Data & Analysis

Pandas NumPy

🌐 Web Development

HTML CSS JavaScript Vue

🛠️ Tools & Infrastructure

Git Docker CUDA HPC

🌍 Languages

English (TOEFL 110, GRE 328) Chinese (Native)

🎓 Professional Service

Computer Vision & Multimedia

CVPR 2025 ICCV 2025 WACV 2026 MICCAI 2025 ACM MM 2024

Machine Learning & AI

NeurIPS 2024 ICML 2025 ICLR 2024 AAAI AISTATS 2024

Journals

IJCV TIP

🎯 Hobbies & Interests

🎤 Singing

Love belting out tunes and exploring different music styles. Certified in Singing Popular Music by Berklee College of Music.

🍳 Cooking

Always experimenting with new recipes — my kitchen is my lab!

🏸 Sports

Into badminton, basketball, and table tennis. Also hitting the gym as a total newbie!

🎲 Board Games

Obsessed with Splendor, Catan, Ticket to Ride, Azul, 7 Wonders, Carcassonne, and Wingspan. Game night anyone?

⛰️ Hiking & Traveling

Love chasing sunrises on mountain peaks. Explored Huangshan, Jiuhuashan, Zhuhai, Changsha, Istanbul, Morocco's Sahara, Seoul, and Singapore.

🎮 Gaming

Not just building Honor of Kings (Lingbao developer!) but rocking National Server Nezha, Golden Badge Nakoruru, and ranked Top 50 Jungler in Hefei.

Feel free to hit me up for karaoke, board games, badminton, hiking, or just grabbing food and chatting about life!

📬 Let's Connect

Get in Touch

📫 Email: cyd0806@mail.ustc.edu.cn

💼 I'm eager to connect with fellow deep learning enthusiasts and researchers passionate about advancing AI.

📍 USTC Gaoxin campus, Hefei, Anhui, China

Yinda Chen (陈胤达)

🎯 Research Areas

🔥 News

📖 Education

💼 Professional Experience

📝 Selected Research

🥇 Honors and Awards

💬 Talks, Skills & Service

🎯 Hobbies & Interests

📬 Let's Connect

Get in Touch

Visitor Map 🌍