Commit
·
bb83a3f
1
Parent(s):
27c0b4b
Update README.md
Browse files
README.md
CHANGED
@@ -17,6 +17,11 @@ The model corrects spelling errors and typos by bringing all the words in the te
|
|
17 |
Corrector was trained based on the model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B).
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/orgs/ai-forever/sage).
|
19 |
|
|
|
|
|
|
|
|
|
|
|
20 |
### Examples
|
21 |
| Input | Output |
|
22 |
| --- | --- |
|
@@ -105,7 +110,7 @@ print(answer)
|
|
105 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
106 |
- [T5-large-spell](https://huggingface.co/ai-forever/T5-large-spell), HuggingFace
|
107 |
|
108 |
-
##
|
109 |
Model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
110 |
Our solution also comes with an MIT license.
|
111 |
|
|
|
17 |
Corrector was trained based on the model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B).
|
18 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/orgs/ai-forever/sage).
|
19 |
|
20 |
+
### Articles and speeches
|
21 |
+
- [Speech about the SAGE library](https://youtu.be/yFfkV0Qjuu0), DataFest 2023
|
22 |
+
- [Article about synthetic error generation methods](https://www.dialog-21.ru/media/5914/martynovnplusetal056.pdf), Dialogue 2023
|
23 |
+
- [Article about SAGE and our best solution](https://arxiv.org/abs/2308.09435), Review EACL 2024
|
24 |
+
|
25 |
### Examples
|
26 |
| Input | Output |
|
27 |
| --- | --- |
|
|
|
110 |
- [FredT5-large-spell](https://huggingface.co/ai-forever/FRED-T5-large-spell), HuggingFace
|
111 |
- [T5-large-spell](https://huggingface.co/ai-forever/T5-large-spell), HuggingFace
|
112 |
|
113 |
+
## License
|
114 |
Model [M2M100-1.2B](https://huggingface.co/facebook/m2m100_1.2B), on the basis of which our solution is made, and its source code are supplied under the MIT open license.
|
115 |
Our solution also comes with an MIT license.
|
116 |
|