DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
Scalable Extraction of Training Data from (Production) Language Models Paper • 2311.17035 • Published Nov 28, 2023 • 3
Git Re-Basin: Merging Models modulo Permutation Symmetries Paper • 2209.04836 • Published Sep 11, 2022 • 1
PLeaS -- Merging Models with Permutations and Least Squares Paper • 2407.02447 • Published Jul 2, 2024
PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling Paper • 2503.09368 • Published 12 days ago • 2
Generating Multi-Image Synthetic Data for Text-to-Image Customization Paper • 2502.01720 • Published Feb 3 • 8
view post Post 14368 Google drops Gemini 2.0 Flash Thinkinga new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and morenow available in anychat, try it out: akhaliq/anychat See translation 3 replies · 🚀 10 10 🔥 5 5 👍 3 3 👀 2 2 + Reply
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 14
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance Paper • 2409.15759 • Published Sep 24, 2024 • 1
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers Paper • 2409.15760 • Published Sep 24, 2024 • 1
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 86
view post Post 14649 QwQ-32B-Preview is now available in anychatA reasoning model that is competitive with OpenAI o1-mini and o1-previewtry it out: akhaliq/anychat See translation 1 reply · ❤️ 3 3 👀 2 2 + Reply
view post Post 4285 New model drop in anychatallenai/Llama-3.1-Tulu-3-8B is now availabletry it here: akhaliq/anychat See translation 🔥 3 3 👍 1 1 + Reply
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 36
Disentangled Motion Modeling for Video Frame Interpolation Paper • 2406.17256 • Published Jun 25, 2024
Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation Paper • 2403.10911 • Published Mar 16, 2024