📘 MMScan Hierarchical VIsual Grounding Challenge

🔍 Challenge Introduction

Hierarchical Visual Grounding (HVG) Task in the MMScan Benchmark:
This task evaluates a model’s ability to perform visual grounding at multiple levels of granularity — from region to object-level, and from single-target localization to inter-targets localization. Given a natural language description, models are expected to accurately locate the corresponding object(s) within the 3D scenes, reflecting comprehensive spatial and attribute-level understanding.

Overview: You can refer to this website for an overview and our paper for more details.
Challenge Data and Codebase: The challenge dataset includes:
- Training set: Language prompts + ground-truth bounding boxes
- Validation set: Language prompts + ground-truth bounding boxes
- Test set: Language prompts only (no ground truth provided)
  Follow the instructions to get familiar with data organization and MMScan APIs. All the code for MMScan is available here.
Evaluation Metrics: For the visual grounding task, our evaluator computes multiple metrics including [email protected] (Average Precision), [email protected], and [email protected] where the gTop-k metric is an expanded metric that generalizes the traditional Top-k metric, offering superior flexibility and interpretability compared to traditional ones when oriented towards multi-target grounding.
Contact: For any questions related to the HVG challenge, feel free to reach out to Jingli Lin.

📝 How to Participate

To register for the challenge, please contact us via Google Mail and include the following information:

A self-chosen username (this will be shown on the leaderboard)
A login password
Your team or institution name
A brief statement on your motivation for participating

📌 Submission limit: Each user is allowed a maximum of 5 submissions per day.

🚀 Submission Guidelines

Your submission should be a dictionary, where each key is a sample ID from the test split.
For each sample, provide:
- pred_bboxes: a list of predicted bounding boxes
- scores: the corresponding confidence scores

An expected result is:

{
    'VG_Inter_Space_OO__1mp3d_0009_region0__55'(sample ID):
    {
        'pred_bboxes'(list, 100*9): [[...],...]
        'scores'(list, 100): [...]
    }
...
}

💡 Note: The bounding boxes do not need to be sorted by confidence.

⛔ Limit the number of predicted boxes to 100 per sample. If your submission contains more than 100 boxes for a single sample, only the top 100 will be considered.
⏱️ Efficiency Tip: Round all floating-point numbers in your submission to two decimal places to reduce file size and transmission overhead. (To ensure fairness during evaluation, all decimal numbers in the submitted predictions will be rounded to two decimal places.)