Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published Dec 18, 2024 β’ 126
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper β’ 2409.17146 β’ Published Sep 25, 2024 β’ 106
Running 84 84 Gradio Lipsync Wav2lip π Combine audio with a video or image to create a lip-synched video
Running on Zero 460 460 Florence2 + SAM2 π₯ Segment objects in images and videos using text prompts
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images Paper β’ 2406.13735 β’ Published Jun 19, 2024 β’ 5
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper β’ 2406.10601 β’ Published Jun 15, 2024 β’ 66
Running on Zero 720 720 Florence 2 π Analyze images to generate captions, detect objects, or perform OCR
Running on Zero 120 120 MimicBrush π¨ Transfers textures from a reference image to a masked region in a source image