Clip Vision - a rovo Collection

rovo 's Collections

3D Mesh

Audio

Text Generation

Dataset

codellm

Diffusion LORAs

Papers

Flux

Clip Vision

updated Dec 27, 2024

InvokeAI/ip_adapter_sd_image_encoder

Updated Sep 23, 2023 • 10.6k • 11
InvokeAI/ip_adapter_sdxl_image_encoder

Updated Sep 23, 2023 • 7.8k • 14
vikhyatk/moondream2

Image-Text-to-Text • Updated 29 days ago • 153k • 1.02k
Running

420

420

moondream2

🌔

a tiny vision language model
Running on T4

1.24k

1.24k

CLIP Interrogator 2

🕵

Generate text descriptions from images
Running on A10G

2.82k

2.82k

CLIP Interrogator

🕵

Analyze image to generate descriptive prompt
Running on Zero

86

86

Llava Llama-3 8B

🔥

Meta Llama3 8b with Llava Multimodal capabilities
Running on Zero

1.11k

1.11k

FLUX Prompt Generator

😻

Display a user interface for various tasks
apple/MobileCLIP-S2-OpenCLIP

Zero-Shot Image Classification • Updated Jul 22, 2024 • 56.2k • 6
openai/clip-vit-large-patch14

Zero-Shot Image Classification • Updated Sep 15, 2023 • 21.1M • 1.62k
zer0int/CLIP-GmP-ViT-L-14

Zero-Shot Image Classification • Updated Sep 23, 2024 • 27.4k • 380
Running on Zero

182

182

OmniParser

😻

Convert GUI screen to structured elements
meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • Updated Dec 4, 2024 • 2.15M • • 1.29k
huihui-ai/Llama-3.2-11B-Vision-Instruct-abliterated

Image-Text-to-Text • Updated Oct 22, 2024 • 6.86k • 21
Running on Zero

62

62

Prompt Enhancer with WD Tagger & Florence 2 Flux/SD3 Captioner

🏃

Generate detailed image descriptions for prompts
Running on Zero

71

71

Florence 2 Flux

🦀

Generate detailed image descriptions
apple/DepthPro

Depth Estimation • Updated Oct 9, 2024 • 2.01k • 394
Running

76

76

Omnivlm Dpo Demo

👁

Upload images and get detailed descriptions