File size: 614 Bytes
43fc5c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b52afa9
43fc5c5
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
license: apache-2.0
tags:
  - multimodal
  - vision-language
  - video understanding
  - spatial reasoning
  - visuospatial cognition
  - llava
  - qwen
  - llava-video
datasets:
  - nkkbr/ViCA-322K
  - nkkbr/ViCA-thinking-2.68k
language:
  - en
library_name: transformers
pipeline_tag: video-text-to-text
model_name: ViCA-ScanNetPP-7B
base_model: lmms-lab/LLaVA-Video-7B-Qwen2
---
## Usage and Full Documentation

For detailed model description, training setup, datasets, evaluation results, and inference code, **please refer to the main ViCA-7B README**:

[**nkkbr/ViCA**](https://huggingface.co/nkkbr/ViCA)