Update README.md
Browse files
README.md
CHANGED
@@ -1,11 +1,64 @@
|
|
1 |
---
|
2 |
base_model: mistral-community/pixtral-12b
|
3 |
library_name: peft
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Model Card for Pixtral-GroundCap
|
7 |
|
8 |
-
This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (
|
9 |
|
10 |
## Model Details
|
11 |
|
@@ -33,9 +86,9 @@ This model is a fine-tuned version of Pixtral-12B optimized for grounded image c
|
|
33 |
### Direct Use
|
34 |
|
35 |
The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
|
36 |
-
-
|
37 |
-
-
|
38 |
-
-
|
39 |
|
40 |
Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.
|
41 |
|
|
|
1 |
---
|
2 |
base_model: mistral-community/pixtral-12b
|
3 |
library_name: peft
|
4 |
+
license: cc-by-4.0
|
5 |
+
datasets:
|
6 |
+
- daniel3303/GroundCap
|
7 |
+
language:
|
8 |
+
- en
|
9 |
+
metrics:
|
10 |
+
- bleu
|
11 |
+
- meteor
|
12 |
+
- cider
|
13 |
+
- spice
|
14 |
+
- f1
|
15 |
+
- recall
|
16 |
+
- precision
|
17 |
+
- gmeteor
|
18 |
+
- rouge
|
19 |
+
|
20 |
+
model-index:
|
21 |
+
- name: PixtralGroundCap
|
22 |
+
results:
|
23 |
+
- task:
|
24 |
+
type: image-captioning
|
25 |
+
subtype: grounded-image-captioning
|
26 |
+
dataset:
|
27 |
+
name: daniel3303/GroundCap
|
28 |
+
type: grounded-image-captioning
|
29 |
+
split: test
|
30 |
+
metrics:
|
31 |
+
- name: Precision
|
32 |
+
type: grounding-precision
|
33 |
+
value: 0.58
|
34 |
+
- name: Recall
|
35 |
+
type: grounding-recall
|
36 |
+
value: 0.96
|
37 |
+
- name: F1
|
38 |
+
type: grounding-f1
|
39 |
+
value: 0.69
|
40 |
+
- name: BLEU-4
|
41 |
+
type: bleu-4
|
42 |
+
value: 0.19
|
43 |
+
- name: METEOR
|
44 |
+
type: meteor
|
45 |
+
value: 0.23
|
46 |
+
- name: CIDEr
|
47 |
+
type: cider
|
48 |
+
value: 0.51
|
49 |
+
- name: SPICE
|
50 |
+
type: spice
|
51 |
+
value: 0.30
|
52 |
+
- name: gMETEOR
|
53 |
+
type: gmeteor
|
54 |
+
value: 0.35
|
55 |
+
|
56 |
+
|
57 |
---
|
58 |
|
59 |
# Model Card for Pixtral-GroundCap
|
60 |
|
61 |
+
This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (`<gdo>`), actions (`<gda>`), and locations (`<gdl>`) to specific regions in images.
|
62 |
|
63 |
## Model Details
|
64 |
|
|
|
86 |
### Direct Use
|
87 |
|
88 |
The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
|
89 |
+
- `<gdo>` for objects
|
90 |
+
- `<gda>` for actions
|
91 |
+
- `<gdl>` for locations
|
92 |
|
93 |
Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.
|
94 |
|