Commit
·
2744513
1
Parent(s):
74fd142
Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,11 @@
|
|
1 |
---
|
2 |
language:
|
3 |
- ru
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Card for ruM2M100-1.2B model
|
@@ -19,3 +24,11 @@ An extensive dataset with “artificial” errors was taken as a training corpus
|
|
19 |
| прийдя в МГТУ я был удивлен никого необноружив там… | прийдя в МГТУ я был удивлен никого не обнаружив там... |
|
20 |
| | |
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
language:
|
3 |
- ru
|
4 |
+
tags:
|
5 |
+
- spellchecking
|
6 |
+
- M2M100
|
7 |
+
- pytorch
|
8 |
+
- natural language generation
|
9 |
---
|
10 |
|
11 |
# Card for ruM2M100-1.2B model
|
|
|
24 |
| прийдя в МГТУ я был удивлен никого необноружив там… | прийдя в МГТУ я был удивлен никого не обнаружив там... |
|
25 |
| | |
|
26 |
|
27 |
+
## Metrics
|
28 |
+
### Quality
|
29 |
+
Below are automatic metrics for determining the correctness of the spell checkers.
|
30 |
+
We compare our solution with both open automatic spell checkers and the ChatGPT family of models on all four available datasets:
|
31 |
+
- **RUSpellRU**: ****texts collected from ([LiveJournal](https://www.livejournal.com/media)), with manually corrected typos and errors;
|
32 |
+
- **MultidomainGold**: examples from 7 text sources, including the open web, news, social media, reviews, subtitles, policy documents and literary works;
|
33 |
+
- **MedSpellChecker**: texts with errors from medical anamnesis;
|
34 |
+
- **GitHubTypoCorpusRu**: spelling errors and typos in commits from [GitHub](https://github.com);
|