Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,89 +1,8 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
[](https://bryanswkim.github.io/chain-of-zoom/)
|
| 10 |
-
[](https://arxiv.org/abs/2505.18600)
|
| 11 |
-
|
| 12 |
-
---
|
| 13 |
-
## π₯ Summary
|
| 14 |
-
|
| 15 |
-
Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:
|
| 16 |
-
|
| 17 |
-
1. **Blur and artifacts** when pushed to magnify beyond its training regime
|
| 18 |
-
2. **High computational costs and inefficiency** of retraining models when we want to magnify further
|
| 19 |
-
|
| 20 |
-
This brings us to the fundamental question: \
|
| 21 |
-
_How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?_
|
| 22 |
-
|
| 23 |
-
We address this via **Chain-of-Zoom** π, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts.
|
| 24 |
-
CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training.
|
| 25 |
-
Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM.
|
| 26 |
-
This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.
|
| 27 |
-
|
| 28 |
-
## π οΈNews
|
| 29 |
-
- [May 2025] Code and paper are uploaded.
|
| 30 |
-
|
| 31 |
-
## π οΈ Setup
|
| 32 |
-
First, create your environment. We recommend using the following commands.
|
| 33 |
-
|
| 34 |
-
```
|
| 35 |
-
git clone https://github.com/bryanswkim/Chain-of-Zoom.git
|
| 36 |
-
cd Chain-of-Zoom
|
| 37 |
-
|
| 38 |
-
conda create -n coz python=3.10
|
| 39 |
-
conda activate coz
|
| 40 |
-
pip install -r requirements.txt
|
| 41 |
-
```
|
| 42 |
-
|
| 43 |
-
## β³ Models
|
| 44 |
-
|
| 45 |
-
|Models|Checkpoints|
|
| 46 |
-
|:---------|:--------|
|
| 47 |
-
|Stable Diffusion v3|[Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
|
| 48 |
-
|Qwen2.5-VL-3B-Instruct|[Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
|
| 49 |
-
|RAM|[Hugging Face](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/ram_swin_large_14m.pth)
|
| 50 |
-
|
| 51 |
-
## π Example
|
| 52 |
-
You can quickly check the results of using **CoZ** with the following example:
|
| 53 |
-
```
|
| 54 |
-
python inference_coz.py \
|
| 55 |
-
-i samples \
|
| 56 |
-
-o inference_results/coz_vlmprompt \
|
| 57 |
-
--rec_type recursive_multiscale \
|
| 58 |
-
--prompt_type vlm \
|
| 59 |
-
--lora_path ckpt/SR_LoRA/model_20001.pkl \
|
| 60 |
-
--vae_path ckpt/SR_VAE/vae_encoder_20001.pt \
|
| 61 |
-
--pretrained_model_name_or_path 'stabilityai/stable-diffusion-3-medium-diffusers' \
|
| 62 |
-
--ram_ft_path ckpt/DAPE/DAPE.pth \
|
| 63 |
-
--ram_path ckpt/RAM/ram_swin_large_14m.pth \
|
| 64 |
-
```
|
| 65 |
-
Which will give a result like below:
|
| 66 |
-
|
| 67 |
-

|
| 68 |
-
|
| 69 |
-
## π¬ Efficient Memory
|
| 70 |
-
Using ```--efficient_memory``` allows CoZ to run on a single GPU with 24GB VRAM, but highly increases inference time due to offloading. \
|
| 71 |
-
We recommend using two GPUs.
|
| 72 |
-
|
| 73 |
-
## π Citation
|
| 74 |
-
If you find our method useful, please cite as below or leave a star to this repository.
|
| 75 |
-
|
| 76 |
-
```
|
| 77 |
-
@article{kim2025chain,
|
| 78 |
-
title={Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment},
|
| 79 |
-
author={Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
|
| 80 |
-
journal={arXiv preprint arXiv:2505.18600},
|
| 81 |
-
year={2025}
|
| 82 |
-
}
|
| 83 |
-
```
|
| 84 |
-
|
| 85 |
-
## π€ Acknowledgements
|
| 86 |
-
We thank the authors of [OSEDiff](https://github.com/cswry/OSEDiff) for sharing their awesome work!
|
| 87 |
-
|
| 88 |
-
> [!note]
|
| 89 |
-
> This work is currently in the preprint stage, and there may be some changes to the code.
|
|
|
|
| 1 |
+
title: Chain-of-Zoom
|
| 2 |
+
emoji: π
|
| 3 |
+
colorFrom: green
|
| 4 |
+
colorTo: green
|
| 5 |
+
sdk: gradio
|
| 6 |
+
sdk_version: 5.31.0
|
| 7 |
+
app_file: app.py
|
| 8 |
+
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|