Chao Huang's Homepage

Chao Huang

I am a fifth-year PhD candidate in the Department of Computer Science at the University of Rochester, advised by Prof. Chenliang Xu. Previously, I spent one wonderful year as a research assistant at the Chinese University of Hong Kong, working with Prof. Chi-Wing Fu on 3D vision. I received my B.Eng. from ESE Department, Nanjing University in 2019. In my undergrad, I worked with Prof. Zhan Ma on image compression.

I am working on multimodal learning and generation. Recently, I am particularly interested in how to leverage the power of large language models (LLMs) to enhance multimodal understanding and generation.

Research opportunities: I am open to collaborating on research projects. Shoot me an email if you are insterested.

✉️ I'm currently seeking full-time opportunities. Please feel free to reach out if you have any openings!

Email / CV / Google Scholar

News

[04/2025]	I am co-organizing the 🔒 TrustFM: Workshop on Trustworthy Foundation Models @ ICCV 2025!
[04/2025]	Video Understanding with Large Language Models: A Survey is accepted to IEEE TCSVT!
[02/2025]	Two papers accepted to CVPR 2025! See you in Nashville 🎸.
[12/2024]	🏆 DAVIS won the ACCV 2024 Best Paper Award, Honorable Mention!
[09/2024]	Two papers accepted to ACCV 2024 with DAVIS as Oral presentation. See you in Hanoi, Vietnam 🍜.
[07/2024]	Acoustic Primitives is accepted to ECCV 2024! See you in Milan ⛪.
[05/2024]	I have rejoined Meta Reality Labs as a summer research intern, this time based in the UK.
[09/2023]	One paper accepted to NeurIPS 2023!
[06/2023]	Invited paper talk at Joint International 3rd Ego4D and 11th EPIC Workshop @ CVPR 2023.
[03/2023]	I will be joining Meta Reality Labs Pittsburgh for summer internship!
[02/2023]	One paper accepted to CVPR 2023!

Research

	⭐ ZeroSep: Separate Anything in Audio with Zero Training Chao Huang, Yuesheng Ma, Junxuan Huang, Susan Liang, Yunlong Tang, Jing Bi, Wenqiang Liu, Nima Mesgarani, Chenliang Xu arxiv, 2025 Paper / Project Page / Code No fine-tuning, no task-specific data, just latent inversion + text-conditioned denoising to isolate any sound you describe.
	⭐ FreSca: Scaling in Frequency Space Enhances Diffusion Models Chao Huang, Susan Liang, Yunlong Tang, Li Ma, Yapeng Tian, Chenliang Xu CVPR GMCV, 2025 Paper / Project Page / Code Where and why you should care about frequency space in diffusion models.
	🔥 Learning to Highlight Audio by Watching Movies Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh CVPR, 2025 Paper / Project Page / Code / Dataset We learn from movies to transform audio to deliver appropriate highlighting effects guided by the accompanying video.
	Video Understanding with Large Language Models: A Survey Yunlong Tang, ... , Chao Huang, ... , Ping Luo, Jiebo Luo, Chenliang Xu IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)*, 2025 Paper / Project Page A survey on the recent Large Language Models for video understanding.
	VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? Yunlong Tang, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, and Pooyan Fazli, Chenliang Xu CVPR, 2025 Paper / Project Page / Code We introduce VidComposition, a benchmark designed to assess MLLMs' understanding of video compositions
	Scaling Concept with Text-Guided Diffusion Models Chao Huang, Susan Liang, Yunlong Tang, Yapeng Tian, Anurag Kumar, Chenliang Xu arXiv preprint, 2024 Paper / Project Page / Code We use pretrained text-guided diffusion models to scale up/down concepts in image/audio.
	DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu ACCV, 2024 🏆 Best Paper Award, Honorable Mention Paper / Project Page / Code A new take on the audio-visual separation problem with the recent generative diffusion models.
	Language-Guided Joint Audio-Visual Editing Via One-Shot Adaptation Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu ACCV, 2024 Paper / Project Page / Dataset We achieve joint audio-visual editing under language guidance.
	Modeling and Driving Human Body Soundfields through Acoustic Primitives Chao Huang, Dejan Markovic, Chenliang Xu, Alexander Richard ECCV, 2024 Paper / Project Page Thinking of the equivalent of 3D Gaussian Splatting and volumetric primitives for the human body soundfield? Here, we introduce Acoustic Primitives.
	AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu NeurIPS, 2023 Paper / Project Page / Code We propose a novel method of synthesizing real-world audio-visual scenes at novel positions and directions.
	Egocentric Audio-Visual Object Localization Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu CVPR, 2023 Paper / Code We explore the problem of sound source visual localization in egocentric videos, propose a new localization method and establish a benchmark for evaluation.
	Non-Local Part-Aware Point Cloud Denoising Chao Huang, Ruihui Li, Xianzhi Li, Chi-Wing Fu arXiv preprint, 2020 A non-local attention based method for point cloud denoising in both synthetic and real scenes.
	Extreme Image Compression via Multiscale Autoencoders With Generative Adversarial Optimization Chao Huang, Haojie Liu, Tong Chen, Qiu Shen, Zhan Ma IEEE Visual Communications and Image Processing (VCIP), 2019 (Oral Presentation) An image compression system under extreme condition, e.g., < 0.05 bits per pixel (bpp).

Education

	University of Rochester, NY, USA Ph.D. in Computer Science Jan. 2021 - Present Advisor: Chenliang Xu
	Nanjing University, Nanjing, China B.Eng in Electronic Science and Engineering Sept. 2015 - Jun. 2019

Experience

	Meta Reality Labs Research, Meta, Cambridge, UK Research Scientist Intern May. 2024 - Aug. 2024 Mentor: Sanjeel Parekh , Ruohan Gao, Anurag Kumar
	Codec Avatars Lab, Meta, Pittsburgh Research Scientist Intern May. 2023 - Nov. 2023 Mentor: Dejan Markovic , Alexander Richard
	The Chinese University of Hong Kong, Shatin, Hong Kong Research Assistant Jul. 2019 - Dec. 2020 Advisor: Chi-Wing Fu

Professional Service

Workshop Organizer:	TrustFM: Workshop on Trustworthy Foundation Models @ ICCV 2025
Conference Reviewer:	CVPR (2023 - 2025), AAAI (2023 - 2025), ACM MM (2023 -2025), ICCV (2025)
Journal Reviewer:	TMM, TIP, SIGGRAPH

The template is based on Jon Barron's website.