VisAH: Audio Highlighting Demo Gallery

Exploring visually-guided acoustic highlighting to transform audio experiences

VisAH is a novel approach that transforms audio to deliver appropriate highlighting effects guided by the accompanying video. This gallery showcases examples of VisAH in action, comparing our results with other methods and demonstrating applications.

Movie Comparison Examples

Examples from our Muddy Mixed Dataset, showcasing: input poorly mixed video, LCE highlighting results, VisAH model outputs, and the original movie clips for comparison.

Example 1: Movie "No way out"

In this example, the speech is not highlighted properly in the input, and our model resolves this issue.

Example 2: Movie "Shooter"

In this video, our model highlights the sound effect properly.

Example 3: Movie "The Amazing Spider Man"

In this video, our model highlights the speech properly.

V2A Refinement Application

Our VisAH model can refine video-to-audio generation by rebalancing audio sources in alignment with the video, resulting in improved audio-visual coherence.

MovieGen Examples

Note: Videos are sourced from the MovieGen website. All videos are adjusted to the same loudness level.

OpenAI Sora + Seeing-and-Hearing Examples

The generated videos are from OpenAI Sora, and the corresponding audios are generated by Seeing-and-Hearing.

Real Video Refinement

Our VisAH model can also be applied to real-world videos, where audio is often recorded with suboptimal quality and may require rebalancing.
The videos are sourced from the AudioCaps dataset.