Commit
Β·
6cf649a
1
Parent(s):
329e204
update README
Browse files
README.md
CHANGED
@@ -43,26 +43,22 @@ The translated dataset can be downloaded from [conceptual-12m-multilingual-maria
|
|
43 |
Though the original dataset contains 12M image-text pairs, a lot of the URLs are invalid now, and in some cases, images are corrupt or broken. We remove such examples from our data, which leaves us with approximately 10M image-text pairs.
|
44 |
|
45 |
#### **Train set:**
|
46 |
-
Total data: <br>
|
47 |
-
10010625 captions <br>
|
48 |
-
2502656 images br><br>
|
49 |
|
50 |
-
Language-wise distribution: <br>
|
51 |
-
English: 2502656
|
52 |
-
Spanish: 2502656
|
53 |
-
Deutsch: 2502656
|
54 |
-
French: 2502656
|
55 |
|
56 |
#### **Validation set**
|
57 |
-
Total data: <br>
|
58 |
-
|
59 |
-
|
60 |
-
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
Deutsch: 27648 captions<br>
|
65 |
-
French: 27648 captions<br>
|
66 |
|
67 |
## Training procedure π¨π»βπ»
|
68 |
### Training
|
|
|
43 |
Though the original dataset contains 12M image-text pairs, a lot of the URLs are invalid now, and in some cases, images are corrupt or broken. We remove such examples from our data, which leaves us with approximately 10M image-text pairs.
|
44 |
|
45 |
#### **Train set:**
|
46 |
+
Total data: 10010625 captions, 2502656 images <br>
|
|
|
|
|
47 |
|
48 |
+
Language-wise captions distribution: <br>
|
49 |
+
English: 2502656<br>
|
50 |
+
Spanish: 2502656<br>
|
51 |
+
Deutsch: 2502656<br>
|
52 |
+
French: 2502656<br>
|
53 |
|
54 |
#### **Validation set**
|
55 |
+
Total data: 110592 captions, 27648 images <br>
|
56 |
+
|
57 |
+
Language-wise captions distribution: <br>
|
58 |
+
English: 27648<br>
|
59 |
+
Spanish: 27648<br>
|
60 |
+
Deutsch: 27648<br>
|
61 |
+
French: 27648<br>
|
|
|
|
|
62 |
|
63 |
## Training procedure π¨π»βπ»
|
64 |
### Training
|