Fangyi Chen's Homepage

About Me

I am a Research Scientist in Generative AI at Intelligent Creation, ByteDance. My work focuses on post-training of multimodal large language model, with an emphasis on scene/video understanding and AI agent.

I received my Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University in 2025, where I was affiliated with the CyLab Security & Privacy Institute and advised by Prof. Marios Savvides. Prior to that, I obtained my M.S. from the University of Pittsburgh in 2018, advised by Prof. Zhi-Hong Mao, and my B.E. from North China Electric Power University in 2017.

ByteDance

Research Scientist (Intelligent Creation)

Jul. 2025 - Present
Carnegie Mellon University

Ph.D.

Jan. 2020 - May. 2025
CMU CyLab

Research Associate, Jan. 2019 - Nov. 2019
University of Pittsburgh

M.S.

Aug. 2017 - Dec. 2018
North China Electric Power University

B.E.

Sept. 2013 - May. 2017

Publications

Referring Layer Decomposition

Fangyi Chen, Yaojie Shen, Lu Xu, Ye Yuan, Shu Zhang, Yulei Niu and Longyin Wen

International Conference on Learning Representations (ICLR), 2026

paper

code

MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption

Chen Li, Zhantao Yang, Han Zhang, Fangyi Chen, Chenchen Zhu, Anudeep Bolimera and Marios Savvides

International Conference on Learning Representations (ICLR), 2026

paper

code

STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision

Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, Zihan Wang, Anudeep Bolimera and Marios Savvides

AAAI Conference on Artificial Intelligence (AAAI), 2026

paper

code

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Hao Chen, Yujin Han, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou and Bhiksha Raj

International Conference on Machine Learning (ICML), 2025

paper

code

SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu and Emad Barsoum

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

paper

code

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection

Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu and Marios Savvides

Preprint

paper

code

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Fangyi Chen, Han Zhang, Kai Hu, Yukai Huang, Chenchen Zhu and Marios Savvides

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2023

paper

code

Unitail: Detecting, Reading, and Matching in Retail Scene

Fangyi Chen, Han Zhang, Zaiwang Li, Jiachen Dou, Shentong Mo, Hao Chen, Yongxin Zhang, Uzair Ahmed, Chenchen Zhu and Marios Savvides

European conference on computer vision (ECCV), 2022

paper

project website

code

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

Chenchen Zhu, Fangyi Chen, Uzair Ahmed, Zhiqiang Shen and Marios Savvides

Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2021

paper

Soft Anchor-Point Object Detection

Chenchen Zhu, Fangyi Chen, Zhiqiang Shen and Marios Savvides

European conference on computer vision (ECCV), 2020

paper

NCMS: Towards accurate anchor free object detection through 𝓁2 norm calibration and multi-feature selection

Fangyi Chen, Chenchen Zhu, Zhiqiang Shen, Han Zhang and Marios Savvides

Computer Vision and Image Understanding (CVIU), Volumn 200, 103050

paper

Solving missing-annotation object detection with background recalibration loss

Han Zhang, Fangyi Chen, Zhiqiang Shen, Qiqi Hao, Chenchen Zhu and Marios Savvides

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

paper

code

Patents

US12189714B2

System and method for improved few-shot object detection using a dynamic semantic network
US20250182450A1

System and method for weapon detection with pose estimation
US12266156B2

System and method for solving missing annotation object detection
US20250005881A1

System and method for assigning complex concave polygons as bounding boxes
US12131497B2

Fast object search based on the cocktail party effect
US20240355085A1

System and method for matching products and determining spreads and plugs
US11915463B2

System and method for the automatic enrollment of object images into a gallery
WO2020210825A1

System and method for detecting products and product labels
WO2022211995A1

System and method for using non-axis aligned bounding boxes for retail detection
WO2022109295A1

System and method for detecting and classifying abnormal cells
US11954175

Feature pyramids for object detection
US2022058432A1

Few-shot object detection using semantic relation reasoning

Academic Services

Conference Reviewer:

• CVPR • ICCV • ECCV • NeurIPS • ICLR • ICML

Journal Reviewer:

• TIP • IJCV • PR • TGRS • NeuralComputing

Teaching

Graduate Teaching Assistant