Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,27 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
short_description: Torch Transformers Diffusion SFT for Computer Vision
|
| 12 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
## Abstract
|
| 14 |
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:
|
| 15 |
|
|
|
|
| 10 |
license: mit
|
| 11 |
short_description: Torch Transformers Diffusion SFT for Computer Vision
|
| 12 |
---
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
## Abstract
|
| 17 |
+
Explore AI vision with `torch`, `transformers`, and `diffusers`! Dual `st.camera_input` 📷 captures feed async OCR (Qwen2-VL, TrOCR), image gen (Stable Diffusion), and line drawings (Torch Space-inspired) on CPU. Key papers:
|
| 18 |
+
|
| 19 |
+
- 🌐 **[Streamlit](https://arxiv.org/abs/2308.03892)** - Thiessen et al., 2023: UI.
|
| 20 |
+
- 🔥 **[PyTorch](https://arxiv.org/abs/1912.01703)** - Paszke et al., 2019: Core.
|
| 21 |
+
- 🔍 **[Qwen2-VL](https://arxiv.org/abs/2408.11039)** - Li et al., 2024: Multimodal OCR.
|
| 22 |
+
- 🔍 **[TrOCR](https://arxiv.org/abs/2109.10282)** - Li et al., 2021: Small OCR.
|
| 23 |
+
- 🎨 **[LDM](https://arxiv.org/abs/2112.10752)** - Rombach et al., 2022: Image gen.
|
| 24 |
+
- 👁️ **[OpenCV](https://arxiv.org/abs/2308.11236)** - Bradski, 2000: CV tools.
|
| 25 |
+
|
| 26 |
+
Run: `pip install -r requirements.txt`, `streamlit run ${app_file}`. Snap, test, innovate! ${emoji}
|
| 27 |
+
|
| 28 |
+
## Usage 🎯
|
| 29 |
+
- 📷 **Camera Snap**: Single or burst capture (auto 10 frames) with gallery.
|
| 30 |
+
- 🔍 **Test OCR**: `Qwen2-VL-OCR-2B` or `TrOCR-Small` extracts text, saved async.
|
| 31 |
+
- 🎨 **Test Image Gen**: `OFA-Sys/small-stable-diffusion-v0` generates images, saved async.
|
| 32 |
+
- ✏️ **Test Line Drawings**: OpenCV line art (Torch Space-inspired), saved async.
|
| 33 |
+
|
| 34 |
## Abstract
|
| 35 |
Fuse `torch`, `transformers`, and `diffusers` for SFT-powered NLP and CV! Dual `st.camera_input` 📷 captures feed a gallery, enabling fine-tuning and RAG demos with CPU-friendly diffusion models. Key papers:
|
| 36 |
|