WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation Paper • 2508.16763 • Published Aug 22, 2025 • 2
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning Paper • 2508.09804 • Published Aug 13, 2025
Scope: Selective Cross-modal Orchestration of Visual Perception Experts Paper • 2510.12974 • Published Oct 14, 2025
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
Grounding Computer Use Agents on Human Demonstrations Paper • 2511.07332 • Published Nov 10, 2025 • 105
When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks Paper • 2210.12786 • Published Oct 23, 2022
Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory Paper • 2307.10768 • Published Jul 20, 2023
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Paper • 2210.07179 • Published Oct 13, 2022 • 3
Learning to Learn: How to Continuously Teach Humans and Machines Paper • 2211.15470 • Published Nov 28, 2022
Image-editing/Emu3-Base-SFT-without_cot-Jul25_sft.c_lr5e-5-checkpoint-800 8B • Updated Jul 29, 2025 • 6
Image-editing/Emu3-Base-SFT-without_cot-Jul25_sft.c_lr5e-5-checkpoint-800 8B • Updated Jul 29, 2025 • 6