English
mjjung commited on
Commit
0694265
·
verified ·
1 Parent(s): 0f255ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - mjjung/ActivityNet-VTune
5
+ language:
6
+ - en
7
+ base_model:
8
+ - ShuhuaiRen/TimeChat-7b
9
+ ---
10
+
11
+ # TimeChat-7B-Charades-VTune Model
12
+
13
+ ## Model details
14
+
15
+ We trained [TimeChat](https://arxiv.org/abs/2312.02051) using VTune, a developed instruction-tuning method specifically designed to account for consistency.
16
+
17
+ For the tuning, we utilized 10K training videos from Charades-STA with 205K automatically generated annotations.
18
+
19
+ ## Evaluation
20
+ We evaluated the model on ActivtyNet-CON and ActivtyNet-Captions.
21
+
22
+ - ActivityNet-CON
23
+ | Metric | Value |
24
+ |-----------------|-------------|
25
+ | Ground | 37.4 |
26
+ | R-Ground | 28.3 (75.6) |
27
+ | S-Ground | 10.6 (28.3) |
28
+ | H-Verify | 19.6 (52.3) |
29
+ | C-Verify | 19.5 (51.5) |
30
+
31
+ - ActivityNet-Captions
32
+ | Metric | Value |
33
+ |-----------------|---------|
34
+ | R@1 IoU=0.3 | 57.74 |
35
+ | R@1 IoU=0.5 | 41.05 |
36
+ | R@1 IoU=0.7 | 23.72 |
37
+ | mIoU | 40.89 |
38
+
39
+ **Paper and Code for more information:**
40
+ [Paper](https://arxiv.org/abs/2411.12951), [Code](https://github.com/minjoong507/consistency-of-video-llm)
41
+
42
+ ## Citation
43
+ If you find our research and codes useful, please consider starring our repository and citing our paper:
44
+
45
+ ```
46
+ @article{jung2024consistency,
47
+ title={On the Consistency of Video Large Language Models in Temporal Comprehension},
48
+ author={Jung, Minjoon and Xiao, Junbin and Zhang, Byoung-Tak and Yao, Angela},
49
+ journal={arXiv preprint arXiv:2411.12951},
50
+ year={2024}
51
+ }
52
+ ```
53
+