Chao Huang

Chao Huang

I teach machines to see, hear, and reason as one.

omni-modal LLMsmultimodal reasoningaudio-visual generation

Research Scientist, Tencent Hunyuan

I build multimodal systems that reason across vision, language, and sound — teaching large models to perceive, infer, and create across vision, language, and audio. My recent focus is on faithful cross-modal reasoning in LLMs and few-shot generation with diffusion priors. I value work that is measurable and deployable.

My research centers on omni-modal reasoning and generation, with a preference for leveraging pretrained foundation models (LLMs, diffusion) over task-specific architectures. I focus on problems where quality, and real-world scalability all matter.

Ph.D. Computer Science, University of Rochester · B.Eng, Nanjing University

News

Apr 2026 DRIFT accepted to ACL 2026 Findings.
Mar 2026 Successfully defended my PhD dissertation.
Mar 2026 Invited talk on efficient adaptation of foundation models at Adobe Research and Simon Fraser University.
Feb 2026 Three papers accepted to CVPR 2026 (including one at Findings).
Jan 2026 XModBench accepted to ICLR 2026. CAT-V received AAAI 2026 Best Demo Award, Runner-up.
Jan 2026 Joined Tencent Hunyuan as a Research Scientist.
Earlier
Oct 2025 Received the NeurIPS 2025 Scholar Award.
Sep 2025 Three papers accepted to NeurIPS 2025; one to IJCV.
Aug 2025 Selected for ICCV 2025 Doctoral Consortium.
Jun 2025 One paper accepted to ICCV 2025.
Feb 2025 Two papers accepted to CVPR 2025.
Dec 2024 DAVIS received ACCV 2024 Best Paper Award, Honorable Mention.
Jul 2024 Acoustic Primitives accepted to ECCV 2024.
Sep 2023 One paper accepted to NeurIPS 2023.
Feb 2023 One paper accepted to CVPR 2023.

Selected Works

A few projects that define what I care about.

DRIFT

DRIFT

The first method to inject directional reasoning structure into MLLM fine-tuning. Instead of hoping reasoning emerges, we explicitly steer models toward structured, faithful cross-modal inference.

ACL Findings 2026 Paper Project Code
ZeroSep

ZeroSep

A zero-shot paradigm shift for audio separation: separate any sound category without ever training on it. Pretrained diffusion priors replace paired training data entirely.

NeurIPS 2025 Paper Project Code
DAVIS

DAVIS

The first generative diffusion approach to visually-guided sound separation, producing high-fidelity audio from diverse real-world mixtures. Best Paper Award at ACCV 2024.

ACCV 2024 → IJCV 2025 Best Paper Award, Runner-up Paper Project Code

20+ papers at NeurIPS, CVPR, ICCV, ECCV, ICLR, IJCV, and more.

View all publications →

Experience

Tencent Hunyuan

Research Scientist
2026 – Present

Omni-modal LLM research and deployment at scale.

AMD Research

Research Scientist Intern
2025

Multimodal reasoning and efficient adaptation of large language models. Led to DRIFT, accepted at ACL Findings 2026.

Meta Reality Labs Research

Research Scientist Intern · Cambridge, UK
2024

Audio-visual learning for cinematic audio highlighting and sound design. Resulted in a CVPR 2025 paper on audio highlighting.

Meta — Codec Avatars Lab

Research Scientist Intern · Pittsburgh
2023

Neural acoustic modeling and spatial audio for human body soundfields. Published at ECCV 2024.

The Chinese University of Hong Kong

Research Assistant
2019 – 2020

3D point cloud processing and non-local denoising methods.

Talks

Mar 2026
Efficient Adaptation of Foundation Models for Multimodal Content Creation Adobe Research · Invited Talk
Mar 2026
Efficient Adaptation of Foundation Models for Multimodal Content Creation Simon Fraser University · Invited Talk
Jun 2023
Ego-AV-Loc: Egocentric Audio-Visual Object Localization Joint International 3rd Ego4D and 11th EPIC Workshop @ CVPR 2023

Honors & Service

Awards

  • Best Paper Award, Honorable Mention ACCV 2024
  • Best Demo Runner-up AAAI 2026
  • NeurIPS 2025 Scholar Award
  • ICCV 2025 Doctoral Consortium

Service