Scaling Concepts

Overview

We present a training-free method to scale up or down the existing concepts in image/audio. This can serves many existing editing tasks like face attribute editing, image harmonization, and audio separation. And we can also support new applications like canonical pose generation, weather manipulation, anime skectch enhancement, generative audio highlighting, and more. We show some examples below. Place your mouse on the image to see the transformation.

Anime Enhancement

By scaling up the "anime" concept, we can mitigate the fuzziness and blurriness issues commonly encountered in the anime production process

Canonical Pose Generation

By scaling up the concept of an object, we can adjust its pose to be more complete and visible

Object Stitching

By enhancing an object's concept, we can seamlessly stitch the object and the background together, completing and harmonizing the whole image

Abstract

Text-guided diffusion models have revolutionized generative tasks by producing high-fidelity content from text descriptions. They have also enabled an editing paradigm where concepts can be replaced through text conditioning (e.g., a dog to a tiger). In this work, we explore a novel approach: instead of replacing a concept, can we enhance or suppress the concept itself? Through an empirical study, we identify a trend where concepts can be decomposed in text-guided diffusion models. Leveraging this insight, we introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements. To systematically evaluate our approach, we present the WeakConcept-10 dataset, where concepts are imperfect and need to be enhanced. More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains, including tasks such as canonical pose generation and generative sound highlighting or removal.

Scaling Concept with Text-Guided Diffusion Models

tl;dr: We use pretrained text-guided diffusion models
to scale up/down concepts in image/audio

Overview

Anime Enhancement

Canonical Pose Generation

Object Stitching

Abstract

Scaling Concept with Text-Guided Diffusion Models

tl;dr: We use pretrained text-guided diffusion modelsto scale up/down concepts in image/audio

Overview

Anime Enhancement

Canonical Pose Generation

Object Stitching

Abstract

tl;dr: We use pretrained text-guided diffusion models
to scale up/down concepts in image/audio