bhavitvyamalik commited on
Commit
6cf649a
Β·
1 Parent(s): 329e204

update README

Browse files
Files changed (1) hide show
  1. README.md +13 -17
README.md CHANGED
@@ -43,26 +43,22 @@ The translated dataset can be downloaded from [conceptual-12m-multilingual-maria
43
  Though the original dataset contains 12M image-text pairs, a lot of the URLs are invalid now, and in some cases, images are corrupt or broken. We remove such examples from our data, which leaves us with approximately 10M image-text pairs.
44
 
45
  #### **Train set:**
46
- Total data: <br>
47
- 10010625 captions <br>
48
- 2502656 images br><br>
49
 
50
- Language-wise distribution: <br>
51
- English: 2502656 captions<br>
52
- Spanish: 2502656 captions<br>
53
- Deutsch: 2502656 captions<br>
54
- French: 2502656 captions<br>
55
 
56
  #### **Validation set**
57
- Total data: <br>
58
- 110592 captions <br>
59
- 27648 images <br><br>
60
-
61
- Language-wise distribution: <br>
62
- English: 27648 captions<br>
63
- Spanish: 27648 captions<br>
64
- Deutsch: 27648 captions<br>
65
- French: 27648 captions<br>
66
 
67
  ## Training procedure πŸ‘¨πŸ»β€πŸ’»
68
  ### Training
 
43
  Though the original dataset contains 12M image-text pairs, a lot of the URLs are invalid now, and in some cases, images are corrupt or broken. We remove such examples from our data, which leaves us with approximately 10M image-text pairs.
44
 
45
  #### **Train set:**
46
+ Total data: 10010625 captions, 2502656 images <br>
 
 
47
 
48
+ Language-wise captions distribution: <br>
49
+ English: 2502656<br>
50
+ Spanish: 2502656<br>
51
+ Deutsch: 2502656<br>
52
+ French: 2502656<br>
53
 
54
  #### **Validation set**
55
+ Total data: 110592 captions, 27648 images <br>
56
+
57
+ Language-wise captions distribution: <br>
58
+ English: 27648<br>
59
+ Spanish: 27648<br>
60
+ Deutsch: 27648<br>
61
+ French: 27648<br>
 
 
62
 
63
  ## Training procedure πŸ‘¨πŸ»β€πŸ’»
64
  ### Training