Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,125 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
|
3 |
+
---
|
4 |
+
license: "cc-by-nc-4.0"
|
5 |
+
tags:
|
6 |
+
- vision
|
7 |
+
- video-classification
|
8 |
+
---
|
9 |
+
|
10 |
+
# FAL - Framework For Automated Labeling Of Videos (FALVideoClassifier)
|
11 |
+
|
12 |
+
FAL (Framework for Automated Labeling Of Videos) is a custom video classification model developed by **SVECTOR** and fine-tuned on the **FAL-500** dataset. This model is designed for efficient video understanding and classification, leveraging state-of-the-art video processing techniques.
|
13 |
+
|
14 |
+
## Model Overview
|
15 |
+
|
16 |
+
This model, referred to as `FALVideoClassifier`, is built using a **TimeSformer-based architecture**, fine-tuned on **FAL-500**, and optimized for automated video labeling tasks. It is capable of classifying a video into one of the 400 possible labels from the FAL-500 dataset.
|
17 |
+
|
18 |
+
This model was developed by **SVECTOR** as part of our initiative to advance automated video understanding and classification technologies.
|
19 |
+
|
20 |
+
## Intended Uses & Limitations
|
21 |
+
|
22 |
+
This model is designed for video classification tasks, and you can use it to classify videos into one of the 400 classes from the FAL-500 dataset. Please note that the model was trained on **FAL-500** and may not perform as well on datasets that significantly differ from this.
|
23 |
+
|
24 |
+
### Intended Use:
|
25 |
+
- Automated video labeling
|
26 |
+
- Video content classification
|
27 |
+
- Research in video understanding and machine learning
|
28 |
+
|
29 |
+
### Limitations:
|
30 |
+
- Only trained on FAL-500
|
31 |
+
- May not generalize well to out-of-domain videos without further fine-tuning
|
32 |
+
- Requires videos to be pre-processed (such as resizing frames, normalization, etc.)
|
33 |
+
|
34 |
+
## How to Use
|
35 |
+
|
36 |
+
To use this model for video classification, follow these steps:
|
37 |
+
|
38 |
+
### Installation:
|
39 |
+
|
40 |
+
Ensure you have the necessary dependencies installed:
|
41 |
+
|
42 |
+
```bash
|
43 |
+
pip install torch torchvision transformers
|
44 |
+
```
|
45 |
+
|
46 |
+
### Code Example:
|
47 |
+
|
48 |
+
Here is an example Python code snippet for using the FAL model to classify a video:
|
49 |
+
|
50 |
+
```python
|
51 |
+
from transformers import AutoImageProcessor, FALVideoClassifierForVideoClassification
|
52 |
+
import numpy as np
|
53 |
+
import torch
|
54 |
+
|
55 |
+
# Simulating a sample video (8 frames of size 224x224 with 3 color channels)
|
56 |
+
video = list(np.random.randn(8, 3, 224, 224)) # 8 frames, each of size 224x224 with RGB channels
|
57 |
+
|
58 |
+
# Load the image processor and model
|
59 |
+
processor = AutoImageProcessor.from_pretrained("SVECTOR-CORPORATION/FAL")
|
60 |
+
model = FALVideoClassifierForVideoClassification.from_pretrained("SVECTOR-CORPORATION/FAL")
|
61 |
+
|
62 |
+
# Pre-process the video input
|
63 |
+
inputs = processor(video, return_tensors="pt")
|
64 |
+
|
65 |
+
# Run inference with no gradient calculation (evaluation mode)
|
66 |
+
with torch.no_grad():
|
67 |
+
outputs = model(**inputs)
|
68 |
+
logits = outputs.logits
|
69 |
+
|
70 |
+
# Find the predicted class (highest logit)
|
71 |
+
predicted_class_idx = logits.argmax(-1).item()
|
72 |
+
|
73 |
+
# Output the predicted label
|
74 |
+
print("Predicted class:", model.config.id2label[predicted_class_idx])
|
75 |
+
```
|
76 |
+
|
77 |
+
### Model Details:
|
78 |
+
|
79 |
+
- **Model Name**: `FALVideoClassifier`
|
80 |
+
- **Dataset Used**: FAL-S500
|
81 |
+
- **Input Size**: 8 frames of size 224x224 with 3 color channels (RGB)
|
82 |
+
|
83 |
+
### Configuration:
|
84 |
+
|
85 |
+
The `FALVideoClassifier` uses the following hyperparameters:
|
86 |
+
|
87 |
+
- `num_frames`: Number of frames in the video (e.g., 8)
|
88 |
+
- `num_labels`: The number of possible video classes (500 for FAL-500)
|
89 |
+
- `hidden_size`: Hidden size for transformer layers (768)
|
90 |
+
- `attention_probs_dropout_prob`: Dropout probability for attention layers (0.0)
|
91 |
+
- `hidden_dropout_prob`: Dropout probability for the hidden layers (0.0)
|
92 |
+
- `drop_path_rate`: Dropout rate for stochastic depth (0.0)
|
93 |
+
|
94 |
+
### Preprocessing:
|
95 |
+
|
96 |
+
Before feeding videos into the model, ensure the frames are properly pre-processed:
|
97 |
+
|
98 |
+
- Resize frames to `224x224`
|
99 |
+
- Normalize pixel values (use the processor from the model, as shown in the code)
|
100 |
+
|
101 |
+
## License
|
102 |
+
|
103 |
+
This model is licensed under the **CC-BY-NC-4.0** license, which means it can be used for non-commercial purposes with proper attribution.
|
104 |
+
|
105 |
+
## Citation
|
106 |
+
|
107 |
+
If you use this model in your research or projects, please cite the following:
|
108 |
+
|
109 |
+
```bibtex
|
110 |
+
@inproceedings{bertasius2021space,
|
111 |
+
title={Is Space-Time Attention All You Need for Video Understanding?},
|
112 |
+
author={Bertasius, Gedas and Wang, Heng and Torresani, Lorenzo},
|
113 |
+
booktitle={International Conference on Machine Learning},
|
114 |
+
pages={813--824},
|
115 |
+
year={2021},
|
116 |
+
organization={PMLR}
|
117 |
+
}
|
118 |
+
```
|
119 |
+
|
120 |
+
## Contact
|
121 |
+
|
122 |
+
For any inquiries regarding this model or its implementation, you can contact the SVECTOR team at [email protected].
|
123 |
+
|
124 |
+
---
|
125 |
+
|