Commit
·
3b8ebf8
1
Parent(s):
10e0b39
Update README.md + Add logo
Browse files- README.md +72 -0
- assets/beetle_logo.png +0 -0
README.md
CHANGED
@@ -1,3 +1,75 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model: mixedbread-ai/mxbai-embed-2d-large-v1
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: model2vec
|
6 |
license: mit
|
7 |
+
model_name: red-beetle-base-v1
|
8 |
+
tags:
|
9 |
+
- embeddings
|
10 |
+
- static-embeddings
|
11 |
+
- sentence-transformers
|
12 |
---
|
13 |
+
# 🪲 red-beetle-base-v1 Model Card
|
14 |
+
|
15 |
+
<div align="center">
|
16 |
+
<img width="75%" alt="Beetle logo" src="./assets/beetle_logo.png">
|
17 |
+
</div>
|
18 |
+
|
19 |
+
> [!TIP]
|
20 |
+
> Beetles are some of the most diverse and interesting creatures on Earth. They are found in every environment, from the deepest oceans to the highest mountains. They are also known for their ability to adapt to a wide range of habitats and lifestyles. They are small, fast and powerful!
|
21 |
+
|
22 |
+
The beetle series of models are made as good starting points for Static Embedding training (via TokenLearn or Fine-tuning), as well as decent Static Embedding models. Each beetle model is made to be an improvement over the original **M2V_base_output** model in some way, and that's the threshold we set for each model (except the brown beetle series, which is the original model).
|
23 |
+
|
24 |
+
This model has been distilled from `mixedbread-ai/mxbai-embed-2d-large-v1`, without PCA or Zipf. This model is a good initialization point for further training.
|
25 |
+
|
26 |
+
## Version Information
|
27 |
+
|
28 |
+
- **red-beetle-base-v0**: The original model, without using PCA or Zipf. The lack of PCA and Zipf also makes this a decent model for further training.
|
29 |
+
- **red-beetle-base-v1**: The original model, without PCA but with (Zipf)^3 re-weighting. It has 1024 dimension embeddings.
|
30 |
+
- **red-beetle-small-v1**: A smaller version of the original model, with PCA at 384 dimensions and (Zipf)^3 re-weighting.
|
31 |
+
|
32 |
+
## Installation
|
33 |
+
|
34 |
+
Install model2vec using pip:
|
35 |
+
|
36 |
+
```bash
|
37 |
+
pip install model2vec
|
38 |
+
```
|
39 |
+
|
40 |
+
## Usage
|
41 |
+
|
42 |
+
Load this model using the `from_pretrained` method:
|
43 |
+
|
44 |
+
```python
|
45 |
+
from model2vec import StaticModel
|
46 |
+
|
47 |
+
# Load a pretrained Model2Vec model
|
48 |
+
model = StaticModel.from_pretrained("bhavnicksm/red-beetle-base-v1")
|
49 |
+
|
50 |
+
# Compute text embeddings
|
51 |
+
embeddings = model.encode(["Example sentence"])
|
52 |
+
```
|
53 |
+
|
54 |
+
Read more about the Model2Vec library [here](https://github.com/MinishLab/model2vec).
|
55 |
+
|
56 |
+
## Comparison with other models
|
57 |
+
|
58 |
+
Coming soon...
|
59 |
+
|
60 |
+
## Acknowledgements
|
61 |
+
|
62 |
+
This model is made using the [Model2Vec](https://github.com/MinishLab/model2vec) library. Credit goes to the [Minish Lab](https://github.com/MinishLab) team for developing this library.
|
63 |
+
|
64 |
+
## Citation
|
65 |
+
|
66 |
+
Please cite the [Model2Vec repository](https://github.com/MinishLab/model2vec) if you use this model in your work.
|
67 |
+
|
68 |
+
```bibtex
|
69 |
+
@software{minishlab2024model2vec,
|
70 |
+
authors = {Stephan Tulkens, Thomas van Dongen},
|
71 |
+
title = {Model2Vec: Turn any Sentence Transformer into a Small Fast Model},
|
72 |
+
year = {2024},
|
73 |
+
url = {https://github.com/MinishLab/model2vec},
|
74 |
+
}
|
75 |
+
```
|
assets/beetle_logo.png
ADDED
![]() |