jinaai
/

jina-embeddings-v3

@@ -17456,33 +17456,6 @@ model-index:
       value: 60.887608967403914
     task:
       type: STS
-  - dataset:
-      config: default
-      name: MTEB QBQTC (default)
-      revision: 790b0510dc52b1553e8c49f3d2afb48c0e5c48b7
-      split: test
-      type: C-MTEB/QBQTC
-    metrics:
-      - type: cosine_pearson
-        value: 34.20049144526891
-      - type: cosine_spearman
-        value: 36.41802814113771
-      - type: euclidean_pearson
-        value: 34.569942139590626
-      - type: euclidean_spearman
-        value: 36.06141660786936
-      - type: main_score
-        value: 36.41802814113771
-      - type: manhattan_pearson
-        value: 34.537041543916003
-      - type: manhattan_spearman
-        value: 36.033418927773825
-      - type: pearson
-        value: 34.20049144526891
-      - type: spearman
-        value: 36.41802814113771
-    task:
-      type: STS
   - dataset:
       config: default
       name: MTEB STSB (default)
@@ -25042,7 +25015,7 @@ model-index:
 <br><br>
 <p align="center">
-<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
 </p>
@@ -25056,7 +25029,7 @@ model-index:
 ## Quick Start
-[Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3-vm) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
 ## Intended Usage & Model Info
@@ -25083,13 +25056,6 @@ While the foundation model supports 100 languages, we've focused our tuning effo
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
-> **⚠️ Important Notice:**
-> We fixed a bug in the `encode` function [#60](https://huggingface.co/jinaai/jina-embeddings-v3/discussions/60) where **Matryoshka embedding truncation** occurred *after normalization*, leading to non-normalized truncated embeddings. This issue has been resolved in the latest code revision.
->
-> If you have encoded data using the previous version and wish to maintain consistency, please use the specific code revision when loading the model: `AutoModel.from_pretrained('jinaai/jina-embeddings-v3', code_revision='da863dd04a4e5dce6814c6625adfba87b83838aa', ...)`
 ## Usage
 **<details><summary>Apply mean pooling when integrating the model.</summary>**
@@ -25240,15 +25206,6 @@ import onnxruntime
 import numpy as np
 from transformers import AutoTokenizer, PretrainedConfig
-# Mean pool function
-def mean_pooling(model_output: np.ndarray, attention_mask: np.ndarray):
-    token_embeddings = model_output
-    input_mask_expanded = np.expand_dims(attention_mask, axis=-1)
-    input_mask_expanded = np.broadcast_to(input_mask_expanded, token_embeddings.shape)
-    sum_embeddings = np.sum(token_embeddings * input_mask_expanded, axis=1)
-    sum_mask = np.clip(np.sum(input_mask_expanded, axis=1), a_min=1e-9, a_max=None)
-    return sum_embeddings / sum_mask
 # Load tokenizer and model config
 tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
 config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
@@ -25270,11 +25227,7 @@ inputs = {
 }
 # Run model
-outputs = session.run(None, inputs)[0]
-# Apply mean pooling and normalization to the model outputs
-embeddings = mean_pooling(outputs, input_text["attention_mask"])
-embeddings = embeddings / np.linalg.norm(embeddings, ord=2, axis=1, keepdims=True)
 ```
 </p>

       value: 60.887608967403914
     task:
       type: STS
   - dataset:
       config: default
       name: MTEB STSB (default)
 <br><br>
 <p align="center">
+<img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
 </p>
 ## Quick Start
+[Blog](https://jina.ai/news/jina-embeddings-v3-a-frontier-multilingual-embedding-model/#parameter-dimensions) | [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/jinaai.jina-embeddings-v3) | [AWS SageMaker](https://aws.amazon.com/marketplace/pp/prodview-kdi3xkt62lo32) | [API](https://jina.ai/embeddings)
 ## Intended Usage & Model Info
 Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian,
 Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu,** and **Vietnamese.**
 ## Usage
 **<details><summary>Apply mean pooling when integrating the model.</summary>**
 import numpy as np
 from transformers import AutoTokenizer, PretrainedConfig
 # Load tokenizer and model config
 tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v3')
 config = PretrainedConfig.from_pretrained('jinaai/jina-embeddings-v3')
 }
 # Run model
+outputs = session.run(None, inputs)
 ```
 </p>