linyueqian
/

ME555_llava_v1.5_finetuned

vision-language

Model card Files Files and versions Community

ME555_llava_v1.5_finetuned / README.md

linyueqian's picture

Update README.md

0d6997a verified about 2 months ago

|

history blame contribute delete

1.23 kB

metadata

language: en
license: mit
library_name: peft
base_model: llava-hf/llava-1.5-7b-hf
tags:
  - robotics
  - vision-language
  - task-detection
  - llava
datasets:
  - synthetic-data

Model Card for Unsolvable Robotic Task Detection

Model Details

Purpose: Detects when robotic tasks are impossible to complete
Base Model: LLaVA v1.5 7B
Developed by: Duke University
Type: Vision-Language Model

Use Cases

Identifying unsolvable robotic tasks in real-time
Explaining why tasks cannot be completed
Supporting safe human-robot interaction

Training Data

4,920 synthetic images with question-answer pairs
Covers five categories: Status Conflicts, Item Absences, Logical Contradictions, Ambiguous Tasks, and Ethical Constraints

Performance

Success rate on SDXL synthetic data: 78.05%
Success rate on simulator synthetic data: 81.00%

Limitations

Works only with tasks similar to training data
Requires human oversight
May not catch novel types of impossible tasks

Getting Started

# Basic configuration
config = {
    "USE_LORA": True,
    "LORA_R": 8,
    "LORA_ALPHA": 8,
    "MODEL_MAX_LEN": 1024
}

Contact

{yixuan.yang,yueqian.lin}@duke.edu