Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper ⢠2506.17218 ⢠Published Jun 20 ⢠28
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence Paper ⢠2506.15677 ⢠Published Jun 18 ⢠23
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper ⢠2505.23604 ⢠Published May 29 ⢠23
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding Paper ⢠2311.03354 ⢠Published Nov 6, 2023 ⢠8