File size: 3,541 Bytes
0421ce3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edbbc61
244965c
0421ce3
 
 
 
 
 
 
 
edbbc61
0421ce3
 
 
1b7fd57
0421ce3
 
 
 
 
 
edbbc61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79adec6
 
 
edbbc61
 
 
 
 
 
79adec6
 
 
edbbc61
 
0421ce3
 
 
 
 
 
edbbc61
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
language:
- multilingual
- ar
- bg
- ca
- cs
- da
- de
- el
- en
- es
- et
- fa
- fi
- fr
- gl
- gu
- he
- hi
- hr
- hu
- hy
- id
- it
- ja
- ka
- ko
- ku
- lt
- lv
- mk
- mn
- mr
- ms
- my
- nb
- nl
- pl
- pt
- ro
- ru
- sk
- sl
- sq
- sr
- sv
- th
- tr
- uk
- ur
- vi
- yo
license: mit
library_name: sentence-transformers
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
language_bcp47:
- fr-ca
- pt-br
- zh-cn
- zh-tw
pipeline_tag: sentence-similarity
inference: false
---

## 0xnu/pmmlv2-fine-tuned-yoruba

Yoruba fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).

> Yoruba words typically consist of various combinations of vowels and consonants. The Yoruba language has a rich phonetic structure, including eighteen consonants and seven vowels. Words in Yoruba can vary in length and complexity, but they generally follow consistent patterns of syllable structure and pronunciation. Additionally, Yoruba words may include diacritical marks such as accents and underdots to indicate tone and vowel length; they are essential to the language's phonology and meaning.

### Usage (Sentence-Transformers)

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

```
pip install -U sentence-transformers
```

### Embeddings

```python
from sentence_transformers import SentenceTransformer
sentences = ["Kini olu ilu England", "Kini eranko ti o gbona julọ ni agbaye?"]

model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-yoruba')
embeddings = model.encode(sentences)
print(embeddings)
```

### Advanced Usage

```python
from sentence_transformers import SentenceTransformer, util
import torch

# Define sentences
sentences = [
    "Kini olu ilu England",
    "Kini eranko ti o gbona julọ ni agbaye?",
    "Bawo ni o se le kọ ede Yoruba?",
    "Kini ounje to gbajumo julọ ni Naijiria?",
    "Iru aso wo ni a maa n wọ fun ijo Yoruba?"
]

# Load the model
model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-yoruba')

# Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)

# Function to find the closest sentence
def find_closest_sentence(query_embedding, sentence_embeddings, sentences):
    # Compute cosine similarities
    cosine_scores = util.pytorch_cos_sim(query_embedding, sentence_embeddings)[0]
    # Find the position of the highest score
    best_match_index = torch.argmax(cosine_scores).item()
    return sentences[best_match_index], cosine_scores[best_match_index].item()

query = "Kini olu ilu England"
query_embedding = model.encode(query, convert_to_tensor=True)
closest_sentence, similarity_score = find_closest_sentence(query_embedding, embeddings, sentences)

print(f"Ibeere: {query}")
print(f"Gbolohun ti o jọ mọ julọ: {closest_sentence}")
print(f"Iwọn ijọra: {similarity_score:.4f}")

# You can also try with a new sentence not in the original list
new_query = "Kini oruko oba to wa ni ilu Oyo?"
new_query_embedding = model.encode(new_query, convert_to_tensor=True)
closest_sentence, similarity_score = find_closest_sentence(new_query_embedding, embeddings, sentences)

print(f"\nIbeere Tuntun: {new_query}")
print(f"Gbolohun ti o jọ mọ julọ: {closest_sentence}")
print(f"Iwọn ijọra: {similarity_score:.4f}")
```

### License

This project is licensed under the [MIT License](./LICENSE).

### Copyright

(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).