daniel3303 commited on
Commit
6b6d58f
·
verified ·
1 Parent(s): 5e37d23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -4
README.md CHANGED
@@ -1,11 +1,64 @@
1
  ---
2
  base_model: mistral-community/pixtral-12b
3
  library_name: peft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
  # Model Card for Pixtral-GroundCap
7
 
8
- This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (<gdo>), actions (<gda>), and locations (<gdl>) to specific regions in images.
9
 
10
  ## Model Details
11
 
@@ -33,9 +86,9 @@ This model is a fine-tuned version of Pixtral-12B optimized for grounded image c
33
  ### Direct Use
34
 
35
  The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
36
- - <gdo> for objects
37
- - <gda> for actions
38
- - <gdl> for locations
39
 
40
  Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.
41
 
 
1
  ---
2
  base_model: mistral-community/pixtral-12b
3
  library_name: peft
4
+ license: cc-by-4.0
5
+ datasets:
6
+ - daniel3303/GroundCap
7
+ language:
8
+ - en
9
+ metrics:
10
+ - bleu
11
+ - meteor
12
+ - cider
13
+ - spice
14
+ - f1
15
+ - recall
16
+ - precision
17
+ - gmeteor
18
+ - rouge
19
+
20
+ model-index:
21
+ - name: PixtralGroundCap
22
+ results:
23
+ - task:
24
+ type: image-captioning
25
+ subtype: grounded-image-captioning
26
+ dataset:
27
+ name: daniel3303/GroundCap
28
+ type: grounded-image-captioning
29
+ split: test
30
+ metrics:
31
+ - name: Precision
32
+ type: grounding-precision
33
+ value: 0.58
34
+ - name: Recall
35
+ type: grounding-recall
36
+ value: 0.96
37
+ - name: F1
38
+ type: grounding-f1
39
+ value: 0.69
40
+ - name: BLEU-4
41
+ type: bleu-4
42
+ value: 0.19
43
+ - name: METEOR
44
+ type: meteor
45
+ value: 0.23
46
+ - name: CIDEr
47
+ type: cider
48
+ value: 0.51
49
+ - name: SPICE
50
+ type: spice
51
+ value: 0.30
52
+ - name: gMETEOR
53
+ type: gmeteor
54
+ value: 0.35
55
+
56
+
57
  ---
58
 
59
  # Model Card for Pixtral-GroundCap
60
 
61
+ This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (`<gdo>`), actions (`<gda>`), and locations (`<gdl>`) to specific regions in images.
62
 
63
  ## Model Details
64
 
 
86
  ### Direct Use
87
 
88
  The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
89
+ - `<gdo>` for objects
90
+ - `<gda>` for actions
91
+ - `<gdl>` for locations
92
 
93
  Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.
94