daniel3303
/

PixtralGroundCap

Model card Files Files and versions Community

daniel3303 commited on Feb 10

Commit

6b6d58f

·

verified ·

1 Parent(s): 5e37d23

Update README.md

Files changed (1) hide show

README.md +57 -4

README.md CHANGED Viewed

@@ -1,11 +1,64 @@
 ---
 base_model: mistral-community/pixtral-12b
 library_name: peft
 ---
 # Model Card for Pixtral-GroundCap
-This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (<gdo>), actions (<gda>), and locations (<gdl>) to specific regions in images.
 ## Model Details
@@ -33,9 +86,9 @@ This model is a fine-tuned version of Pixtral-12B optimized for grounded image c
 ### Direct Use
 The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
-- <gdo> for objects
-- <gda> for actions
-- <gdl> for locations
 Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.

 ---
 base_model: mistral-community/pixtral-12b
 library_name: peft
+license: cc-by-4.0
+datasets:
+- daniel3303/GroundCap
+language:
+- en
+metrics:
+- bleu
+- meteor
+- cider
+- spice
+- f1
+- recall
+- precision
+- gmeteor
+- rouge
+model-index:
+  - name: PixtralGroundCap
+    results:
+      - task:
+          type: image-captioning
+          subtype: grounded-image-captioning
+        dataset:
+          name: daniel3303/GroundCap
+          type: grounded-image-captioning
+          split: test
+        metrics:
+          - name: Precision
+            type: grounding-precision
+            value: 0.58
+          - name: Recall
+            type: grounding-recall
+            value: 0.96
+          - name: F1
+            type: grounding-f1
+            value: 0.69
+          - name: BLEU-4
+            type: bleu-4
+            value: 0.19
+          - name: METEOR
+            type: meteor
+            value: 0.23
+          - name: CIDEr
+            type: cider
+            value: 0.51
+          - name: SPICE
+            type: spice
+            value: 0.30
+          - name: gMETEOR
+            type: gmeteor
+            value: 0.35
 ---
 # Model Card for Pixtral-GroundCap
+This model is a fine-tuned version of Pixtral-12B optimized for grounded image captioning. It generates detailed image descriptions with explicit grounding tags that link textual descriptions to specific visual elements in the image. The model was trained on the GroundCap dataset and uses a novel tag system to ground objects (`<gdo>`), actions (`<gda>`), and locations (`<gdl>`) to specific regions in images.
 ## Model Details
 ### Direct Use
 The model is designed for generating grounded image captions that explicitly link textual descriptions to visual elements using three types of grounding tags:
+- `<gdo>` for objects
+- `<gda>` for actions
+- `<gdl>` for locations
 Each tag maintains object identity through unique IDs, enabling consistent reference tracking throughout the caption.