File size: 2,141 Bytes

---
license: apache-2.0
pipeline_tag: image-text-to-text
---

**<center><span style="font-size:2em;">TinyLLaVA-Video</span></center>**

[![arXiv](https://img.shields.io/badge/Arxiv-2402.14289-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.15513)[![Github](https://img.shields.io/badge/Github-Github-blue.svg)](https://github.com/ZhangXJ199/TinyLLaVA-Video)

For training data, We combine partial data from two datasets: [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K) and [
Valley](https://github.com/RupertLuo/Valley). 

|   Stage  |            Source             |    #Sample    |
|----------| :---------------------------: | :-----------: |
| Pretrain |   LLaVA-Video-178K + Valley   |     397k      |
| Finetune |       LLaVA-Video-178K        |     491k      |

#### Pretrain Data

We use four subsets of [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K): ``0_30_s_academic_v0_1``, ``30_60_s_academic_v0_1``, ``0_30_s_youtube_v0_1``, and ``30_60_s_youtube_v0_1``, supplemented with the filtered [Video-LLaVA](https://huggingface.co/datasets/LanguageBind/Video-LLaVA).

We provide cleaned annotations data, and the video data can be downloaded from [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K) and [Video-LLaVA](https://huggingface.co/datasets/LanguageBind/Video-LLaVA).

#### Finetune Data

We use four subsets of [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K): ``0_30_s_academic_v0_1``, ``30_60_s_academic_v0_1``, ``0_30_s_youtube_v0_1``, and ``30_60_s_youtube_v0_1``. 

We provide cleaned annotations data, and the video data can be downloaded from [LLaVA-Video-178K](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K).

#### Organize Data

Organize the image files and annotation files as follows in ``path/to/your/dataset``:

```Shell
dataset
├── academic_source
├── liwei_youtube_videos
├── valley
├── text_files
│   ├── cleaned_video_caption.json
│   ├── cleaned_video_openqa.json
```

**Note: If there is any infringement, please contact us for removal.**