Jeckmu commited on
Commit
9066789
·
verified ·
1 Parent(s): 424baac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -6
README.md CHANGED
@@ -7,28 +7,64 @@ tags:
7
  - lora
8
  - generated_from_trainer
9
  model-index:
10
- - name: 250210_Abroad_LoRA_2B_INT4
11
  results: []
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # 250210_Abroad_LoRA_2B_INT4
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4) on the qwen2_vl_dora dataset.
20
 
21
  ## Model description
22
 
23
- More information needed
 
 
 
24
 
25
  ## Intended uses & limitations
26
 
27
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ## Training and evaluation data
30
 
31
- More information needed
 
 
 
 
 
 
 
 
32
 
33
  ## Training procedure
34
 
 
7
  - lora
8
  - generated_from_trainer
9
  model-index:
10
+ - name: Qwen2-VL-2B-Instruct-GPTQ-Int4-LoRA-SurveillanceVideo-Classification-250205
11
  results: []
12
+ pipeline_tag: video-classification
13
  ---
14
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
+ # Qwen2-VL-2B-Instruct-GPTQ-Int4-LoRA-SurveillanceVideo-Classification-250205
19
 
20
+ This model is a fine-tuned version of [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) on the Surveillance Video Classification dataset.
21
 
22
  ## Model description
23
 
24
+ This model takes a video as input and classifies it into one of the following six classes
25
+ [1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]
26
+
27
+ LLaMA-Factory was used for training, with the same hyperparameters as described below.
28
 
29
  ## Intended uses & limitations
30
 
31
+ This Model Fine-tuned by the Prompt Below.
32
+ The same is true when running inference.
33
+ ```python
34
+ messages = [
35
+ {
36
+ "role": "user",
37
+ "content": [
38
+ {
39
+ "type": "video",
40
+ "video": video_path,
41
+ "max_pixels": 640 * 360,
42
+ # "fps": 1.0 # maybe default fps = 1.0
43
+ },
44
+ {
45
+ "type": "text",
46
+ "text": (
47
+ "<video>\nWatch the video and choose the six behaviours that apply to you. "
48
+ "[1. loitering, 2. breaking and entering, 3. abandonment, 4. falling down, 5. fighting, 6. arson]. "
49
+ "Your answer must be a single digit, the number of the behaviour."
50
+ )
51
+ }
52
+ ]
53
+ }
54
+ ]
55
+ ```
56
 
57
  ## Training and evaluation data
58
 
59
+ The data used for training was sampled balanced for each class from the original video dataset and trained using 100 videos per class
60
+ (except for the 6. arson class, which used 65 videos).
61
+
62
+ Each video was preprocessed with a resolution of 640x360 and an option of fps=3.0,
63
+ and a 10-second segment of the video where the behavior occurred according to the metadata was cut and used for training.
64
+ (So, in total, we used about 30 frames).
65
+
66
+ In the Inference course, you can use the same prompts as above.
67
+ For training, we used the format of the above prompt with an additional class as the answer.
68
 
69
  ## Training procedure
70