A fine-grained visual reasoning benchmark (We show more question types in the extension dataset.)
Sicheng Feng
FSCCS
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 8 hours ago
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding
upvoted
a
paper
9 days ago
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
liked
a dataset
25 days ago
domenicrosati/TruthfulQA