OSCILLATION INVERSION: UNDERSTANDING THE STRUCTURE OF LARGE FLOW MODELS THROUGH THE LENS OF INVERSION METHODS

anonymous1*,

Oscillation Inversion: is a phenomenon observed in large flow models. Building on this, we developed a simple and fast method that serves as a distribution transfer technique, enabling image and video enhancement as well as low-level editing, e.g. lighting and recoloring.

Main Teaser
Overview of Industry Image Enhancer with diffusion editing boosting. (P+) denotes Piscart preprocessing for ID preservation.
Teaser 1 Teaser 2 Teaser 3 Teaser 4

Our method serves as a domain transfer from lower quality image distribution to higher quality distribution. The result can be directly smoothed as a video enhancer due to its distribution-preserving stability (slightly temporally smoothed by AnimateDiff).

Abstract

We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both toy experiments and real-world diffusion models, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in rectified flow models. Building on this understanding, we introduce a simple, fast distribution transfer technique that facilitates image and video enhancement, as well as low-level editing tasks such as lighting and recoloring. We further provide quantitative results demonstrating the effectiveness of our method for image enhancement and makeup transfer as recoloring tasks.

Main Teaser

Image Enhancement

Graphical Model

Oscillation triggers a high-quality distribution in the 'Output' row from two or more 'weak' distributions, shown as the 'Input' and 'Augmented' rows. The augmented distribution can be obtained from an off-the-shelf lightweight enhancer or image processing technique. The output is of high quality with extremely realistic style and texture, overcoming the over-smooth problem in GenAI images.

Lighting Enhancement

Main Teaser

Oscillation triggers a high-quality output of lighting harmonization through stroke prompts.

Recolor for Makeup

Main Teaser

Oscillation triggers a high-quality output of facial makeup harmonization through stroke prompts.

Video Enhancement

More Results Coming soon:

1. See AnimateDiff Results Above

2. We are working on OscillationInversion for Video Flow Matching

3. We are working on Temporal Module on Flux

BibTex

@article{...,
  title={...},
  author={...},
  booktitle={arXiv preprint arxiv:...},
  year={2024}
}

Acknowledgements: This research is supported by UT Austin VITA Group, thanks Prof.Atlas Wang!