Chao Huang

I am a third-year PhD student in the Department of Computer Science at the University of Rochester, advised by Prof. Chenliang Xu. Previously, I spent one wonderful year as a research assistant at the Chinese University of Hong Kong, working with Prof. Chi-Wing Fu on 3D vision. I received my B.Eng. from ESE Department, Nanjing University in 2019. In my undergrad, I worked with Prof. Zhan Ma on image compression.

I am broadly interested in developing machine learning models to understand how human perceive the surrounding scenes from multi-modal inputs and utilize the perception for action. Specifically, I am working on audio-visual scene understanding, egocentric videos and 3D vision.

Research opportunities: I am open to collaborating on research projects. Shoot me an email if you are insterested.

I am actively looking for research internships for Summer 2024. Feel free to contact me if you are interested in.

Email  /  CV  /  Google Scholar

profile photo
News
[09/2023] One paper accepted by NeurIPS 2023!
[06/2023] Invited paper talk at Joint International 3rd Ego4D and 11th EPIC Workshop @ CVPR 2023.
[03/2023] I will be joining Meta Reality Labs Pittsburgh for summer internship!
[02/2023] One paper accepted by CVPR 2023!
Research
vidllm Video Understanding with Large Language Models: A Survey
Yunlong Tang*, Jing Bi*, Siting Xu*, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Feng Zheng, Jianguo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu (the order of authorship is determined randomly except*)
arXiv preprint, 2023
Paper / Project Page /

A survey on the recent Large Language Models for video understanding.

av_nerf AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
NeurIPS, 2023
Paper / Project Page / Code

We propose a novel method of synthesizing real-world audio-visual scenes at novel positions and directions.

davis DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
Chao Huang, Susan Liang, Yapeng Tian, Anurag Kumar, Chenliang Xu
arXiv preprint, 2023
Paper / Project Page

A new take on the audio-visual separation problem with the recent generative diffusion models.

ego_av_loc Egocentric Audio-Visual Object Localization
Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu
CVPR, 2023
Paper / Code

We explore the problem of sound source visual localization in egocentric videos, propose a new localization method and establish a benchmark for evaluation.

point_denoise Non-Local Part-Aware Point Cloud Denoising
Chao Huang*, Ruihui Li*, Xianzhi Li, Chi-Wing Fu
arXiv preprint, 2020

A non-local attention based method for point cloud denoising in both synthetic and real scenes.

extreme_compression Extreme Image Compression via Multiscale Autoencoders With Generative Adversarial Optimization
Chao Huang, Haojie Liu, Tong Chen, Qiu Shen, Zhan Ma
IEEE Visual Communications and Image Processing (VCIP), 2019   (Oral Presentation)

An image compression system under extreme condition, e.g., < 0.05 bits per pixel (bpp).

Education
University of Rochester, NY, USA
Ph.D. in Computer Science
Jan. 2021 - Present
Advisor: Chenliang Xu
Nanjing University, Nanjing, China
B.Eng in Electronic Science and Engineering
Sept. 2015 - Jun. 2019
Experience
Meta Reality Labs Research, Pittsburgh
Research Scientist Intern
May. 2023 - Nov. 2023
Advisor: Dejan Markovic
The Chinese University of Hong Kong, Shatin, Hong Kong
Research Assistant
Jul. 2019 - Dec. 2020
Advisor: Chi-Wing Fu

The template is based on Jon Barron's website.