📝 Selected Publications

For a complete list of publications, please visit my Google Scholar profile

📈 View Citation Trend
Citation Trend

Note: * denotes equal contribution

🧠 Self-Supervised Learning & Pretraining 4
ICCV 2025
CCF A
TokenUnify

TokenUnify: Scaling Up Autoregressive Pretraining for Computer Vision Computer VisionSelf-Supervised Learning
ICCV | October 25, 2025
Yinda Chen*; Haoyuan Shi*; Xiaoyu Liu; Te Shi; Ruobing Zhang; Dong Liu; Zhiwei Xiong; Feng Wu

Code Dataset Weights

TokenUnify proposes a hierarchical predictive coding framework for computer vision, reducing autoregressive error from O(K) to O(√K). It introduces a dataset with 1.2 billion annotated voxels and achieves 44% improvement over training from scratch.

ICML 2025
CCF A
MaskTwins

MaskTwins: Dual-form Complementary Masking for Domain-Adaptive Image Segmentation Domain AdaptationPretraining Methods
ICML | July 13, 2025
Jiawen Wang; Yinda Chen* (Theory Contribution & Project Leader); Xiaoyu Liu; Che Liu; Dong Liu; Jianqing Gao; Zhiwei Xiong

Code Poster

MaskTwins introduces a dual-form complementary masking strategy for domain-adaptive image segmentation, effectively bridging the domain gap through coordinated spatial and feature-level masking mechanisms.

IJCAI 2023
CCF A
dbMiM

Self-Supervised Computer Vision with Multi-Agent Reinforcement Learning Computer VisionSelf-Supervised Learning
IJCAI (oral) | August 17, 2023
Yinda Chen; Wei Huang; Shenglong Zhou; Qi Chen; Zhiwei Xiong

Code Pretrain Data CREMI VNC

This paper proposes a decision-based MIM for computer vision segmentation. It uses MARL to optimize masking, outperforming alternatives.

ICASSP 2024
CCF B
MS-Con

Learning multiscale consistency for self-supervised electron microscopy instance segmentation Computer VisionPretraining Methods
ICASSP | April 13, 2024
Yinda Chen; Wei Huang; Xiaoyu Liu; Shiyu Deng; Qi Chen; Zhiwei Xiong

Code

A pretraining framework for volume instance segmentation is proposed. It enforces multiscale consistency and shows good performance in instance segmentation tasks.

🏥 Medical Image Analysis & Vision-Language 4
ICCV Workshop 2025
GTGM

GTGM: Generative Text-Guided 3D Vision-Language Pretraining for Medical Image Segmentation Vision-LanguageMedical Imaging
ICCV Workshop | October 25, 2025
Yinda Chen*; Che Liu*; Wei Huang; Xiaoyu Liu; Haoyuan Shi; Sibo Cheng; Rossella Arcucci; Zhiwei Xiong

Code

GTGM extends Vision-Language Pretraining to 3D medical images by leveraging LLMs to generate synthetic textual descriptions, enabling text-guided representation learning without paired medical text. Combined with a negative-free contrastive learning strategy, GTGM achieves state-of-the-art performance across 10 CT/MRI segmentation datasets.

IEEE JBHI
SCI Q1 | IF: 6.7
EMPOWER

EMPOWER: Evolutionary Medical Prompt Optimization With Reinforcement Learning Vision-LanguageMultimodal Learning
IEEE Journal of Biomedical and Health Informatics | October 16, 2025
Yinda Chen*; Yangfan He*; Jing Yang; Dapeng Zhang; Zhenlong Yuan; Muhammad Attique Khan; Jamel Baili; Por Lip Yee

EMPOWER proposes an evolutionary framework for prompt optimization through specialized representation learning and multi-dimensional evaluation. It achieves 24.7% reduction in factual errors and 15.3% higher preference scores.

MICCAI 2024
CCF B
BIMCV-R

BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval Vision-LanguageMultimodal Learning
MICCAI | October 06, 2024
Yinda Chen; Che Liu; Xiaoyu Liu; Rossella Arcucci; Zhiwei Xiong

Dataset

This paper presents BIMCV-R, a 3D CT text-image retrieval dataset, and MedFinder. Tests show MedFinder outperforms baselines in related tasks.

IEEE TMI
SCI Q1 | IF: 10.6
DADn

Unsupervised Domain Adaptation for EM Image Denoising with Invertible Networks Domain AdaptationImage Denoising
IEEE Transactions on Medical Imaging | July 29, 2024
Shiyu Deng; Yinda Chen; Wei Huang; Ruobing Zhang; Zhiwei Xiong

Code

The paper proposes an unsupervised domain adaptation method for EM image denoising with invertible networks, outperforming existing methods.

📦 Image Compression 2
IEEE TPAMI
SCI Q1 | IF: 20.8
GRCL Framework

Learned Image Coding with Generative Reference of Conditional Latents Image CompressionGenerative Models
IEEE Transactions on Pattern Analysis and Machine Intelligence | Accepted, 2025
Siqi Wu*; Yinda Chen*; Weiming Chen; Dong Liu; K. C. Ho; Zhihai He

Code

GRCL presents a generic framework that exploits semantically correlated external images as conditional coding references in the latent domain. Three reference generation methods are investigated: local dictionary retrieval, web-based image search, and diffusion-based image-text-image generation. Theoretical analysis proves robustness to reference perturbations via subspace recovery error bounds. Achieves up to 1.5 dB PSNR gain over state-of-the-art methods with only ~0.005 bpp overhead.

AAAI 2025
CCF A
CLC

Condition-generation Latent Coding with an External Dictionary for Deep Image Compression Image Compression
AAAI (oral) | March 06, 2025
Siqi Wu; Yinda Chen*; Dong Liu; Zhihai He

Code Weights

The paper proposes CLC for deep image compression. It uses a dictionary to generate references, shows good performance, and has theoretical analysis.

🎨 Image Segmentation & Synthesis 1
NeurIPS 2024
CCF A
MaskFactory

MaskFactory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation Multimodal Learning
NeurIPS | October 17, 2024
Haotian Qian; Yinda Chen*; Shengtao Lou; Fahad Shahbaz Khan; Xiaogang Jin; Deng-Ping Fan

Project Code

MaskFactory proposes a two-stage method to generate high-quality synthetic datasets for DIS, outperforming existing methods in quality and efficiency.