Spaces:

alexnasa
/

Chain-of-Zoom

Running on Zero

App Files Files Community

alexnasa commited on May 31

Commit

802a102

verified ·

1 Parent(s): 0301e15

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -89

README.md CHANGED Viewed

@@ -1,89 +1,8 @@
-# Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
-This repository is the official implementation of [Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment](https://arxiv.org/abs/2505.18600), led by
-[Bryan Sangwoo Kim](https://scholar.google.com/citations?user=ndWU-84AAAAJ&hl=en), [Jeongsol Kim](https://jeongsol.dev/), [Jong Chul Ye](https://bispl.weebly.com/professor.html)
-![main figure](assets/teaser.jpg)
-[![Project Website](https://img.shields.io/badge/Project-Website-blue)](https://bryanswkim.github.io/chain-of-zoom/)
-[![arXiv](https://img.shields.io/badge/arXiv-2505.18600-b31b1b.svg)](https://arxiv.org/abs/2505.18600)
----
-## 🔥 Summary
-Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:
-1. **Blur and artifacts** when pushed to magnify beyond its training regime
-2. **High computational costs and inefficiency** of retraining models when we want to magnify further
-This brings us to the fundamental question: \
-_How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?_
-We address this via **Chain-of-Zoom** 🔎, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts.
-CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training.
-Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM.
-This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.
-## 🗓 ️News
-- [May 2025] Code and paper are uploaded.
-## 🛠️ Setup
-First, create your environment. We recommend using the following commands.
-```
-git clone https://github.com/bryanswkim/Chain-of-Zoom.git
-cd Chain-of-Zoom
-conda create -n coz python=3.10
-conda activate coz
-pip install -r requirements.txt
-```
-## ⏳ Models
-|Models|Checkpoints|
-|:---------|:--------|
-|Stable Diffusion v3|[Hugging Face](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
-|Qwen2.5-VL-3B-Instruct|[Hugging Face](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct)
-|RAM|[Hugging Face](https://huggingface.co/spaces/xinyu1205/recognize-anything/blob/main/ram_swin_large_14m.pth)
-## 🌄 Example
-You can quickly check the results of using **CoZ** with the following example:
-```
-python inference_coz.py \
-  -i samples \
-  -o inference_results/coz_vlmprompt \
-  --rec_type recursive_multiscale \
-  --prompt_type vlm \
-  --lora_path ckpt/SR_LoRA/model_20001.pkl \
-  --vae_path ckpt/SR_VAE/vae_encoder_20001.pt \
-  --pretrained_model_name_or_path 'stabilityai/stable-diffusion-3-medium-diffusers' \
-  --ram_ft_path ckpt/DAPE/DAPE.pth \
-  --ram_path ckpt/RAM/ram_swin_large_14m.pth \
-```
-Which will give a result like below:
-![main figure](assets/example_result.png)
-## 🔬 Efficient Memory
-Using ```--efficient_memory``` allows CoZ to run on a single GPU with 24GB VRAM, but highly increases inference time due to offloading. \
-We recommend using two GPUs.
-## 📝 Citation
-If you find our method useful, please cite as below or leave a star to this repository.
-```
-@article{kim2025chain,
-  title={Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment},
-  author={Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
-  journal={arXiv preprint arXiv:2505.18600},
-  year={2025}
-}
-```
-## 🤗 Acknowledgements
-We thank the authors of [OSEDiff](https://github.com/cswry/OSEDiff) for sharing their awesome work!
-> [!note]
-> This work is currently in the preprint stage, and there may be some changes to the code.

+title: Chain-of-Zoom
+emoji: 🚀
+colorFrom: green
+colorTo: green
+sdk: gradio
+sdk_version: 5.31.0
+app_file: app.py
+pinned: false