SVECTOR-OFFICIAL commited on
Commit
af5b802
·
verified ·
1 Parent(s): 2835421

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -3
README.md CHANGED
@@ -1,3 +1,125 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ ---
4
+ license: "cc-by-nc-4.0"
5
+ tags:
6
+ - vision
7
+ - video-classification
8
+ ---
9
+
10
+ # FAL - Framework For Automated Labeling Of Videos (FALVideoClassifier)
11
+
12
+ FAL (Framework for Automated Labeling Of Videos) is a custom video classification model developed by **SVECTOR** and fine-tuned on the **FAL-500** dataset. This model is designed for efficient video understanding and classification, leveraging state-of-the-art video processing techniques.
13
+
14
+ ## Model Overview
15
+
16
+ This model, referred to as `FALVideoClassifier`, is built using a **TimeSformer-based architecture**, fine-tuned on **FAL-500**, and optimized for automated video labeling tasks. It is capable of classifying a video into one of the 400 possible labels from the FAL-500 dataset.
17
+
18
+ This model was developed by **SVECTOR** as part of our initiative to advance automated video understanding and classification technologies.
19
+
20
+ ## Intended Uses & Limitations
21
+
22
+ This model is designed for video classification tasks, and you can use it to classify videos into one of the 400 classes from the FAL-500 dataset. Please note that the model was trained on **FAL-500** and may not perform as well on datasets that significantly differ from this.
23
+
24
+ ### Intended Use:
25
+ - Automated video labeling
26
+ - Video content classification
27
+ - Research in video understanding and machine learning
28
+
29
+ ### Limitations:
30
+ - Only trained on FAL-500
31
+ - May not generalize well to out-of-domain videos without further fine-tuning
32
+ - Requires videos to be pre-processed (such as resizing frames, normalization, etc.)
33
+
34
+ ## How to Use
35
+
36
+ To use this model for video classification, follow these steps:
37
+
38
+ ### Installation:
39
+
40
+ Ensure you have the necessary dependencies installed:
41
+
42
+ ```bash
43
+ pip install torch torchvision transformers
44
+ ```
45
+
46
+ ### Code Example:
47
+
48
+ Here is an example Python code snippet for using the FAL model to classify a video:
49
+
50
+ ```python
51
+ from transformers import AutoImageProcessor, FALVideoClassifierForVideoClassification
52
+ import numpy as np
53
+ import torch
54
+
55
+ # Simulating a sample video (8 frames of size 224x224 with 3 color channels)
56
+ video = list(np.random.randn(8, 3, 224, 224)) # 8 frames, each of size 224x224 with RGB channels
57
+
58
+ # Load the image processor and model
59
+ processor = AutoImageProcessor.from_pretrained("SVECTOR-CORPORATION/FAL")
60
+ model = FALVideoClassifierForVideoClassification.from_pretrained("SVECTOR-CORPORATION/FAL")
61
+
62
+ # Pre-process the video input
63
+ inputs = processor(video, return_tensors="pt")
64
+
65
+ # Run inference with no gradient calculation (evaluation mode)
66
+ with torch.no_grad():
67
+ outputs = model(**inputs)
68
+ logits = outputs.logits
69
+
70
+ # Find the predicted class (highest logit)
71
+ predicted_class_idx = logits.argmax(-1).item()
72
+
73
+ # Output the predicted label
74
+ print("Predicted class:", model.config.id2label[predicted_class_idx])
75
+ ```
76
+
77
+ ### Model Details:
78
+
79
+ - **Model Name**: `FALVideoClassifier`
80
+ - **Dataset Used**: FAL-S500
81
+ - **Input Size**: 8 frames of size 224x224 with 3 color channels (RGB)
82
+
83
+ ### Configuration:
84
+
85
+ The `FALVideoClassifier` uses the following hyperparameters:
86
+
87
+ - `num_frames`: Number of frames in the video (e.g., 8)
88
+ - `num_labels`: The number of possible video classes (500 for FAL-500)
89
+ - `hidden_size`: Hidden size for transformer layers (768)
90
+ - `attention_probs_dropout_prob`: Dropout probability for attention layers (0.0)
91
+ - `hidden_dropout_prob`: Dropout probability for the hidden layers (0.0)
92
+ - `drop_path_rate`: Dropout rate for stochastic depth (0.0)
93
+
94
+ ### Preprocessing:
95
+
96
+ Before feeding videos into the model, ensure the frames are properly pre-processed:
97
+
98
+ - Resize frames to `224x224`
99
+ - Normalize pixel values (use the processor from the model, as shown in the code)
100
+
101
+ ## License
102
+
103
+ This model is licensed under the **CC-BY-NC-4.0** license, which means it can be used for non-commercial purposes with proper attribution.
104
+
105
+ ## Citation
106
+
107
+ If you use this model in your research or projects, please cite the following:
108
+
109
+ ```bibtex
110
+ @inproceedings{bertasius2021space,
111
+ title={Is Space-Time Attention All You Need for Video Understanding?},
112
+ author={Bertasius, Gedas and Wang, Heng and Torresani, Lorenzo},
113
+ booktitle={International Conference on Machine Learning},
114
+ pages={813--824},
115
+ year={2021},
116
+ organization={PMLR}
117
+ }
118
+ ```
119
+
120
+ ## Contact
121
+
122
+ For any inquiries regarding this model or its implementation, you can contact the SVECTOR team at [email protected].
123
+
124
+ ---
125
+