Börje Karlsson's picture

Börje Karlsson

tellarin

·

https://tellarin.com/borje/

AI & ML interests

Machine Learning Systems, Mobile Sensing, Knowledge Mining, Digital Entertainment

Recent Activity

upvoted a paper 8 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

upvoted a paper 8 days ago

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

upvoted a paper 10 days ago

Evaluating Gemini Robotics Policies in a Veo World Simulator

View all activity

Organizations

upvoted 2 papers 8 days ago

Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition

Paper • 2512.15603 • Published 8 days ago • 55

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Paper • 2512.10863 • Published 14 days ago • 21

upvoted a paper 10 days ago

Evaluating Gemini Robotics Policies in a Veo World Simulator

Paper • 2512.10675 • Published 15 days ago • 16

upvoted 11 papers 15 days ago

SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization

Paper • 2512.02631 • Published 24 days ago • 8

TV2TV: A Unified Framework for Interleaved Language and Video Generation

Paper • 2512.05103 • Published 21 days ago • 16

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

Paper • 2512.04797 • Published 22 days ago • 23

ProPhy: Progressive Physical Alignment for Dynamic World Simulation

Paper • 2512.05564 • Published 21 days ago • 5

COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence

Paper • 2512.04563 • Published 22 days ago • 14

Embodied Referring Expression Comprehension in Human-Robot Interaction

Paper • 2512.06558 • Published 19 days ago • 3

VideoVLA: Video Generators Can Be Generalizable Robot Manipulators

Paper • 2512.06963 • Published 18 days ago • 3

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

Paper • 2512.08186 • Published 17 days ago • 21

MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

Paper • 2512.06628 • Published 19 days ago • 12

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Paper • 2512.07802 • Published 17 days ago • 43

Reflection Removal through Efficient Adaptation of Diffusion Transformers

Paper • 2512.05000 • Published 21 days ago • 15

upvoted a paper 17 days ago

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22 • 22

upvoted 2 papers 28 days ago

What does it mean to understand language?

Paper • 2511.19757 • Published Nov 24 • 9

Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Paper • 2511.19900 • Published about 1 month ago • 47

upvoted 2 papers 29 days ago

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published Nov 17 • 52

GigaWorld-0: World Models as Data Engine to Empower Embodied AI

Paper • 2511.19861 • Published about 1 month ago • 30

upvoted a paper about 1 month ago

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published Nov 24 • 27