Adirazgold commited on
Commit
60f7cbf
·
verified ·
1 Parent(s): 99e3481

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -14
README.md CHANGED
@@ -17,23 +17,23 @@ We evaluated granite-vision-3.3-2b-embedding alongside other top colBERT style m
17
  ## **NDCG@5 - ViDoRe V2**
18
  | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
19
  |----------------------------------------|--------------|------------------|-------------|-------------------|-----------
20
- | ESG Restaurant Human | 51.10 | 68.40 | 65.80 | 62.4 | 62.30 |
21
- | Economics Macro Multilingual | 49.90 | 56.50 | 55.40 | 47.4 | 48.30 |
22
- | MIT Biomedical | 59.70 | 63.60 | 63.50 | 58.1 |60.00 |
23
- | ESG Restaurant Synthetic | 57.00 | 57.40 | 56.60 | 51.1 |54.00 |
24
- | ESG Restaurant Synthetic Multilingual | 55.70 | 57.40 | 57.20 | 47.6 |53.50 |
25
- | MIT Biomedical Multilingual | 56.50 | 61.10 | 62.50 | 50.5 | 53.60 |
26
- | Economics Macro | 51.60 | 59.80 | 60.20 | 60.9 |60.00 |
27
- | **Avg (ViDoRe2)** | **54.50** | **60.60** | **60.17** | **54**. |**55.96** |
28
 
29
  ## **NDCG@5 - REAL-MM-RAG**
30
  | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
31
  |----------------------------------------|--------------|------------------|-------------|--------------------------| ------------------
32
- | FinReport | 0.55 | 0.66 | 0.78 | 0.65 |0.70
33
- | FinSlides | 0.68 | 0.79 | 0.81 | 0.55 |0.74
34
- | TechReport | 0.78 | 0.86 | 0.88 | 0.83 |0.84
35
- | TechSlides | 0.90 | 0.93 | 0.92 | 0.91 |0.93
36
- | **Avg (REAL-MM-RAG)** | **0.73** | **0.81** | **0.85** | **0.74** |**0.80**
37
 
38
  - **Release Date**: June 11th 2025
39
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
@@ -105,7 +105,13 @@ print("=" * 50)
105
  For an example of MM-RAG using granite-vision-3.3-2b-embedding refer to [this notebook](......).
106
 
107
  **Model Architecture:**
108
- To Be Updated
 
 
 
 
 
 
109
 
110
  **Training Data:**
111
  Our training data is entirly comprised from DocFM. DocFM is a large-scale comprehensive dataset effort at IBM consisting of 85 million document pages extracted from unique PDF
 
17
  ## **NDCG@5 - ViDoRe V2**
18
  | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
19
  |----------------------------------------|--------------|------------------|-------------|-------------------|-----------
20
+ | ESG Restaurant Human | 51.1 | 68.4 | 65.8 | 62.4 | 62.3 |
21
+ | Economics Macro Multilingual | 49.9 | 56.5 | 55.4 | 47.4 | 48.3 |
22
+ | MIT Biomedical | 59.7 | 63.6 | 63.5 | 58.1 |60.0 |
23
+ | ESG Restaurant Synthetic | 57.0 | 57.4 | 56.6 | 51.1 |54.0 |
24
+ | ESG Restaurant Synthetic Multilingual | 55.7 | 57.4 | 57.2 | 47.6 |53.5 |
25
+ | MIT Biomedical Multilingual | 56.5 | 61.1 | 62.5 | 50.5 | 53.6 |
26
+ | Economics Macro | 51.6 | 59.8 | 60.2 | 60.9 |60.0 |
27
+ | **Avg (ViDoRe2)** | **54.5** | **60.6** | **60.2** | **54.0**. |**56.0** |
28
 
29
  ## **NDCG@5 - REAL-MM-RAG**
30
  | Collection \ Model | ColPali-v1.3 | ColQwen2.5-v0.2 | ColNomic-3b | ColSmolvlm-v0.1 | granite-vision-3.3-2b-embedding |
31
  |----------------------------------------|--------------|------------------|-------------|--------------------------| ------------------
32
+ | FinReport | 55 | 66 | 78 | 65 |70
33
+ | FinSlides | 68 | 79 | 81 | 55 |74
34
+ | TechReport | 78 | 86 | 88 | 83 |84
35
+ | TechSlides | 90 | 93 | 92 | 91 |93
36
+ | **Avg (REAL-MM-RAG)** | **73** | **81** | **85** | **74** |**80**
37
 
38
  - **Release Date**: June 11th 2025
39
  - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
105
  For an example of MM-RAG using granite-vision-3.3-2b-embedding refer to [this notebook](......).
106
 
107
  **Model Architecture:**
108
+ The architecture of granite-vision-3.3-2b-embedding follows ColPali(https://arxiv.org/abs/2407.01449) approach and consists of the following components:
109
+
110
+ (1) Vision-Language model : granite-vision-3.3-2b (https://huggingface.co/ibm-granite/granite-vision-3.3-2b).
111
+
112
+ (2) Projection layer: linear layer that projects the hidden layer dimension of Vision-Language model to 128 and outputs 729 embedding vectors per image.
113
+
114
+ The scoring is computed using MaxSim-based late interaction mechanism.
115
 
116
  **Training Data:**
117
  Our training data is entirly comprised from DocFM. DocFM is a large-scale comprehensive dataset effort at IBM consisting of 85 million document pages extracted from unique PDF