Update README.md
Browse files
README.md
CHANGED
@@ -118,15 +118,15 @@ You can use the code displayed above, or download the files from the directory a
|
|
118 |
### Training Data
|
119 |
|
120 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
121 |
-
real_companies_1
|
122 |
|
123 |
-
real_companies_2
|
124 |
|
125 |
-
real_companies_arabic
|
126 |
|
127 |
-
real_companies_ea
|
128 |
|
129 |
-
synthetic_companies_eu
|
130 |
|
131 |
The entire dataset was split in 8-1-1 as training-validation-testing set
|
132 |
A typical data entry is
|
|
|
118 |
### Training Data
|
119 |
|
120 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
121 |
+
real_companies_1 26764 --real company names in EU languages, Russian and various other languages usually in LATIN format.
|
122 |
|
123 |
+
real_companies_2 24790 --real company names in EU languages, Russian and various other languages usually in LATIN format.
|
124 |
|
125 |
+
real_companies_arabic 2317 --real company names in Arabic
|
126 |
|
127 |
+
real_companies_ea 20328 --real company names in Chinese, Korean, Japanese
|
128 |
|
129 |
+
synthetic_companies_eu 20000 --synthetic company names in EU languages
|
130 |
|
131 |
The entire dataset was split in 8-1-1 as training-validation-testing set
|
132 |
A typical data entry is
|