manu commited on
Commit
0c5c10f
·
verified ·
1 Parent(s): bdf5aa1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -25
README.md CHANGED
@@ -13,34 +13,19 @@ tags:
13
 
14
  # ModernColBERT + InSeNT
15
 
16
- This is a contextual model finetuned from [lightonai/GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
17
-
18
- ## Model Details
19
 
20
- ### Model Description
21
- - **Model Type:** Sentence Transformer
22
- - **Base model:** [lightonai/GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1)
23
- - **Maximum Sequence Length:** tokens
24
- - **Output Dimensionality:** 128 dimensions
25
- - **Similarity Function:** MaxSim
26
- - **Training Dataset:**
27
- - train
28
- <!-- - **Language:** Unknown -->
29
- <!-- - **License:** Unknown -->
30
 
31
- ### Model Sources
32
 
33
- - **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
34
- - **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
35
 
36
- ### Full Model Architecture
 
 
37
 
38
- ```
39
- ColBERT(
40
- (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
41
- (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
42
- )
43
- ```
44
 
45
  ## Usage
46
 
@@ -81,10 +66,35 @@ print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size(
81
  ```
82
 
83
 
 
84
 
85
- ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
- ### BibTeX
 
 
 
 
 
 
 
 
88
 
89
  ```bibtex
90
  @misc{conti2025contextgoldgoldpassage,
 
13
 
14
  # ModernColBERT + InSeNT
15
 
16
+ [![arXiv](https://img.shields.io/badge/arXiv-2505.24782-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2505.24782)
17
+ [![GitHub](https://img.shields.io/badge/Code_Repository-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/illuin-tech/contextual-embeddings)
18
+ [![Hugging Face](https://img.shields.io/badge/ConTEB_HF_Page-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/illuin-conteb)
19
 
20
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/60f2e021adf471cbdf8bb660/jq_zYRy23bOZ9qey3VY4v.png" width="800">
 
 
 
 
 
 
 
 
 
21
 
 
22
 
23
+ This is a contextual model finetuned from [lightonai/GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1) on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
 
24
 
25
+ > [!WARNING]
26
+ > This experimental model stems from the paper [*Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings*](https://arxiv.org/abs/2505.24782).
27
+ > While results are promising, we have seen regression on standard embedding tasks, and using it in production will probably require further work on extending the training set to improve robustness and OOD generalization.
28
 
 
 
 
 
 
 
29
 
30
  ## Usage
31
 
 
66
  ```
67
 
68
 
69
+ ## Model Details
70
 
71
+ ### Model Description
72
+ - **Model Type:** Sentence Transformer
73
+ - **Base model:** [lightonai/GTE-ModernColBERT-v1](https://huggingface.co/lightonai/GTE-ModernColBERT-v1)
74
+ - **Maximum Sequence Length:** tokens
75
+ - **Output Dimensionality:** 128 dimensions
76
+ - **Similarity Function:** MaxSim
77
+ - **Training Dataset:**
78
+ - train
79
+ <!-- - **Language:** Unknown -->
80
+ <!-- - **License:** Unknown -->
81
+
82
+ ### Model Sources
83
+
84
+ - **Repository:** [Contextual Embeddings](https://github.com/illuin-tech/contextual-embeddings)
85
+ - **Hugging Face:** [Contextual Embeddings](https://huggingface.co/illuin-conteb)
86
+
87
+ ### Full Model Architecture
88
 
89
+ ```
90
+ ColBERT(
91
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
92
+ (1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
93
+ )
94
+ ```
95
+
96
+
97
+ ## Citation
98
 
99
  ```bibtex
100
  @misc{conti2025contextgoldgoldpassage,