Yash Thube

thubZ9

https://thubzai.github.io/

AI & ML interests

Multimodal learning • CV • RL

Recent Activity

updated a collection 3 days ago

My reading list!

liked a model 10 days ago

deepseek-ai/DeepSeek-V3-0324

upvoted a paper 13 days ago

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

View all activity

Organizations

thubZ9's activity

upvoted a paper 13 days ago

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

Paper • 2503.13358 • Published 17 days ago • 90

upvoted a collection 19 days ago

C4AI Aya Vision

Collection

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated about 1 month ago • 68

upvoted a collection 21 days ago

Gemma 3 Release

Collection

17 items • Updated about 7 hours ago • 307

upvoted an article 21 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

about 1 month ago

• 71

upvoted a paper 25 days ago

Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published 27 days ago • 112

upvoted a paper 27 days ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 28 days ago • 88

upvoted 8 papers about 1 month ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published about 1 month ago • 74

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 71

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 139

Continuous Diffusion Model for Language Modeling

Paper • 2502.11564 • Published Feb 17 • 52

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

Paper • 2502.13143 • Published Feb 18 • 29

upvoted 6 papers about 2 months ago

Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14 • 109

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published Feb 14 • 52

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 147

mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data

Paper • 2502.08468 • Published Feb 12 • 13

Analyze Feature Flow to Enhance Interpretation and Steering in Language Models

Paper • 2502.03032 • Published Feb 5 • 59

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 216