| language: | |
| - en | |
| pipeline_tag: sentence-similarity | |
| tags: | |
| - sentence-transformers | |
| - feature-extraction | |
| - sentence-similarity | |
| - transformers | |
| - domain-specific | |
| library_name: sentence-transformers | |
| # **YGOMiniLM** | |
|  | |
| [ImgSource](https://yugipedia.com/wiki/Time_Wizard) | |
| This is a sentence-transformers/paraphrase-MiniLM-L3-v2 model that has undergone further domain specific pretraining via Masked Language Modelling. | |
| Its intended use is to create sentence embeddings for fast vector search in the domain of YuGiOh discourse. | |
| ## **Training Data** | |
| The training data was split into two parts: | |
| 1) A private collection of data collected from YouTube Comments: | |
| |CREATOR|N_COMMENTS| | |
| |-----|-----| | |
| |thecalieffect|20,592| | |
| |MBTYuGiOh|5439| | |
| |MSTTV |5340| | |
| |mkohl40|5224| | |
| 2) The Full Database of YuGiOh cards accessed via the [YGOProDeck API](https://ygoprodeck.com/api-guide/) as of 17/05/2023. The `name`, `type`, `race` and `desc` fields were concatenated and delimited by `\t` to create the training examples. | |
| ## **Usage** | |
| ``` | |
| pip install sentence-transformers | |
| ``` | |
| Then to get embeddings you simply run the following: | |
| ``` | |
| from sentence_transformers import SentenceTransformer | |
| sentences = ["FLIP: Target 1 monster on the field; destroy that target.", | |
| "Union Carrier needs to go.", | |
| "Scythe lock is healthy for the game" | |
| ] | |
| model = SentenceTransformer("jkswin/YGO_MiniLM") | |
| embeddings = model.encode(sentences) | |
| print(embeddings) | |
| ``` |