-
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Quantized Visual Geometry Grounded Transformer
Paper • 2509.21302 • Published • 9 -
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 9
Ramanana Rahary
AdrienRR
AI & ML interests
None yet
Organizations
Vision
-
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30 -
Adapting LLaMA Decoder to Vision Transformer
Paper • 2404.06773 • Published • 18 -
Quantized Visual Geometry Grounded Transformer
Paper • 2509.21302 • Published • 9 -
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 9
Multimodal
models 0
None public yet
datasets 0
None public yet