File size: 1,781 Bytes
e363b49 d84587b e363b49 9b148d4 e363b49 9b148d4 e9f30c8 9b148d4 e363b49 d84587b e363b49 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
---
# Model Overview
GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance.
Training Configuration
- Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
- Training Dataset: samheym/ger-dpr-collection
- Dataset: 10% of randomly selected triples from the final dataset
- Vector Length: 128
- Maximum Document Length: 256 Tokens
- Batch Size: 50
- Training Steps: 80,000
- Gradient Accumulation: 1 step
- Learning Rate: 5 × 10⁻⁶
- Optimizer: AdamW
- In-Batch Negatives: Included
## Usage
First install the PyLate library:
```bash
pip install -U pylate
```
### Retrieval
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
```python
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path=samheym/GerColBERT,
)
```
<!--
## Citation
### BibTeX
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |