|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- DeepGlint-AI/MLCD-Embodied-7B |
|
|
--- |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog-1?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-8?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-4?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-9?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual) |
|
|
[](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual) |
|
|
|
|
|
|
|
|
## RefCOCO Segmentation Evaluation: |
|
|
|
|
|
| Dataset | Split | MLCD-seg-7B | EVF-SAM | GLaMM | VisionLLM v2| LISA | |
|
|
| :-- | :-: | :-: | :-: | :-: | :-: | :-: | |
|
|
| RefCOCO | val | **83.6** | 82.4 | 79.5 | 79.2 | 74.9 | |
|
|
| RefCOCO | testA | **85.3** | 84.2 | 83.2 | 82.3 | 79.1 | |
|
|
| RefCOCO | testB | **81.5** | 80.2 | 76.9 | 77.0 | 72.3 | |
|
|
| RefCOCO+ | val | **79.4** | 76.5 | 72.6 | 68.9 | 65.1 | |
|
|
| RefCOCO+ | testA | **82.9** | 80.0 | 78.7 | 75.8 | 70.8 | |
|
|
| RefCOCO+ | testB | **75.6** | 71.9 | 64.6 | 61.8 | 58.1 | |
|
|
| RefCOCOg | val | **79.7** | 78.2 | 74.2 | 73.3 | 67.9 | |
|
|
| RefCOCOg | test | **80.5** | 78.3 | 74.9 | 74.8 | 70.6 | |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
```python |
|
|
model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path |
|
|
mlcd_seg = AutoModel.from_pretrained( |
|
|
model_path, |
|
|
torch_dtype=torch.float16, |
|
|
trust_remote_code=True |
|
|
).cuda() |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) |
|
|
# Assuming you have an image named test.jpg |
|
|
seg_img = Image.open("test.jpg").convert('RGB') |
|
|
seg_prompt = "The <image> provides an overview of the picture.\nCould you provide a segmentation mask for the right giraffe in this image?" |
|
|
pred_mask = model.predict_forward(seg_img, seg_prompt, tokenizer, force_seg=False) |
|
|
``` |
|
|
|
|
|
## Tips for updating this repo in the future |
|
|
|
|
|
|
|
|
Huggingface uses cache management module code, so manual clearing of cache is required after repo update |
|
|
|
|
|
|
|
|
```bash |
|
|
cd ~/.cache/huggingface/modules/transformers_modules |
|
|
rm mlcd_seg.py vision_projector.py vision_resampler.py vision_tower.py sam.py conversation_mlcd_seg.py |
|
|
``` |
|
|
|
|
|
|
|
|
## Citations |
|
|
``` |
|
|
@misc{mlcdseg_wukun, |
|
|
author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang, and Jie, Yu}, |
|
|
title = {MLCD-Seg}, |
|
|
year = {2025}, |
|
|
url = {https://github.com/deepglint/unicom/tree/main/downstream}, |
|
|
} |
|
|
``` |