Nishant24 commited on
Commit
99bb629
·
1 Parent(s): 17fb100

Nishant24/mbart-finetuned-hi-to-en_Siddha_Yoga_Text_by_Nishant

Browse files
Files changed (4) hide show
  1. README.md +142 -40
  2. config.json +1 -1
  3. generation_config.json +1 -1
  4. training_args.bin +3 -0
README.md CHANGED
@@ -1,53 +1,155 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- - hi
6
- metrics:
7
- - bleu
8
- library_name: transformers
9
- pipeline_tag: translation
10
  tags:
11
- - code
 
 
 
12
  ---
13
- This model is a fine-tuned checkpoint of mbart-large-50-many-to-many-mmt is fine-tuned for Siddha Yoga Hindi to English translation. It was introduced in Multilingual Translation with Extensible Multilingual Pretraining and Finetuning paper : https://arxiv.org/pdf/2008.00401.pdf
14
 
15
- The model can translate directly between any pair of 50 languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the forced_bos_token_id parameter to the generate method.
 
16
 
17
- This model was fine tuned as part of Dissertation project of Mtech in Data Science at BITS PILANI by Nishant Chhetri.
18
 
19
- Code to Use the model for Inference :
 
 
20
 
21
- ```python
22
 
23
- from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
24
 
25
- article_hi = "इससे पूर्व हमने 'साधन योग' का वर्णन किया था और उसमें यह समझाया था कि साधना करने वाला हर एक व्यक्ति बिना किसी प्रकार की दीनता के योग के साधनों द्वारा अपने परम लक्ष्यों को प्राप्त कर सकता है।"
26
- article_eng = "Before this I described “Sadhan Yoga” and explained that any sadhak can reach their highest goals through practices of Yoga without any meekness."
27
 
28
- model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
29
- tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
30
 
31
- # translate Hindi to English
32
- tokenizer.src_lang = "hi_IN"
33
- encoded_hi = tokenizer(article_hi, return_tensors="pt")
34
- generated_tokens = model.generate(
35
- **encoded_hi,
36
- forced_bos_token_id=tokenizer.lang_code_to_id["en_XX"]
37
- )
38
- tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
39
- # => "Before this I described “Sadhan Yoga” and explained that any sadhak can reach their highest goals through practices of Yoga without any meekness."
40
- ```
41
 
 
42
 
43
- ## BibTeX entry and citation info
44
- ```
45
- @article{tang2020multilingual,
46
- title={Multilingual Translation with Extensible Multilingual Pretraining and Finetuning},
47
- author={Yuqing Tang and Chau Tran and Xian Li and Peng-Jen Chen and Naman Goyal and Vishrav Chaudhary and Jiatao Gu and Angela Fan},
48
- year={2020},
49
- eprint={2008.00401},
50
- archivePrefix={arXiv},
51
- primaryClass={cs.CL}
52
- }
53
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: facebook/mbart-large-50-many-to-many-mmt
 
 
 
 
 
 
 
3
  tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: mbart-finetuned-hi-to-en_Siddha_Yoga_Text_by_Nishant
7
+ results: []
8
  ---
 
9
 
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
 
13
+ # mbart-finetuned-hi-to-en_Siddha_Yoga_Text_by_Nishant
14
 
15
+ This model is a fine-tuned version of [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt) on the None dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 0.0001
18
 
19
+ ## Model description
20
 
21
+ More information needed
22
 
23
+ ## Intended uses & limitations
 
24
 
25
+ More information needed
 
26
 
27
+ ## Training and evaluation data
 
 
 
 
 
 
 
 
 
28
 
29
+ More information needed
30
 
31
+ ## Training procedure
32
+
33
+ ### Training hyperparameters
34
+
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 0.001
37
+ - train_batch_size: 4
38
+ - eval_batch_size: 4
39
+ - seed: 42
40
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
+ - lr_scheduler_type: linear
42
+ - num_epochs: 100
43
+
44
+ ### Training results
45
+
46
+ | Training Loss | Epoch | Step | Validation Loss |
47
+ |:-------------:|:-----:|:----:|:---------------:|
48
+ | No log | 1.0 | 2 | 9.7387 |
49
+ | No log | 2.0 | 4 | 7.5348 |
50
+ | No log | 3.0 | 6 | 3.8893 |
51
+ | No log | 4.0 | 8 | 1.6238 |
52
+ | No log | 5.0 | 10 | 0.6203 |
53
+ | No log | 6.0 | 12 | 0.5660 |
54
+ | No log | 7.0 | 14 | 0.2931 |
55
+ | No log | 8.0 | 16 | 0.3130 |
56
+ | No log | 9.0 | 18 | 0.1872 |
57
+ | No log | 10.0 | 20 | 0.2907 |
58
+ | No log | 11.0 | 22 | 0.0909 |
59
+ | No log | 12.0 | 24 | 0.1210 |
60
+ | No log | 13.0 | 26 | 0.1342 |
61
+ | No log | 14.0 | 28 | 0.0887 |
62
+ | No log | 15.0 | 30 | 0.0744 |
63
+ | No log | 16.0 | 32 | 0.2141 |
64
+ | No log | 17.0 | 34 | 0.0569 |
65
+ | No log | 18.0 | 36 | 0.0495 |
66
+ | No log | 19.0 | 38 | 0.0500 |
67
+ | No log | 20.0 | 40 | 0.0419 |
68
+ | No log | 21.0 | 42 | 0.5107 |
69
+ | No log | 22.0 | 44 | 0.0366 |
70
+ | No log | 23.0 | 46 | 0.0525 |
71
+ | No log | 24.0 | 48 | 0.0429 |
72
+ | No log | 25.0 | 50 | 0.1173 |
73
+ | No log | 26.0 | 52 | 0.1467 |
74
+ | No log | 27.0 | 54 | 0.1957 |
75
+ | No log | 28.0 | 56 | 0.1132 |
76
+ | No log | 29.0 | 58 | 0.1369 |
77
+ | No log | 30.0 | 60 | 0.3323 |
78
+ | No log | 31.0 | 62 | 0.1180 |
79
+ | No log | 32.0 | 64 | 0.0698 |
80
+ | No log | 33.0 | 66 | 0.0392 |
81
+ | No log | 34.0 | 68 | 0.0306 |
82
+ | No log | 35.0 | 70 | 0.0389 |
83
+ | No log | 36.0 | 72 | 0.0297 |
84
+ | No log | 37.0 | 74 | 0.0304 |
85
+ | No log | 38.0 | 76 | 0.0304 |
86
+ | No log | 39.0 | 78 | 0.1154 |
87
+ | No log | 40.0 | 80 | 0.0285 |
88
+ | No log | 41.0 | 82 | 0.0249 |
89
+ | No log | 42.0 | 84 | 0.0305 |
90
+ | No log | 43.0 | 86 | 0.0302 |
91
+ | No log | 44.0 | 88 | 0.0263 |
92
+ | No log | 45.0 | 90 | 0.0244 |
93
+ | No log | 46.0 | 92 | 0.0257 |
94
+ | No log | 47.0 | 94 | 0.0212 |
95
+ | No log | 48.0 | 96 | 0.0291 |
96
+ | No log | 49.0 | 98 | 0.0281 |
97
+ | No log | 50.0 | 100 | 0.0224 |
98
+ | No log | 51.0 | 102 | 0.0151 |
99
+ | No log | 52.0 | 104 | 0.0173 |
100
+ | No log | 53.0 | 106 | 0.0261 |
101
+ | No log | 54.0 | 108 | 0.0216 |
102
+ | No log | 55.0 | 110 | 0.0145 |
103
+ | No log | 56.0 | 112 | 0.0123 |
104
+ | No log | 57.0 | 114 | 0.0136 |
105
+ | No log | 58.0 | 116 | 0.0113 |
106
+ | No log | 59.0 | 118 | 0.0086 |
107
+ | No log | 60.0 | 120 | 0.0066 |
108
+ | No log | 61.0 | 122 | 0.0040 |
109
+ | No log | 62.0 | 124 | 0.0040 |
110
+ | No log | 63.0 | 126 | 0.0019 |
111
+ | No log | 64.0 | 128 | 0.0009 |
112
+ | No log | 65.0 | 130 | 0.0071 |
113
+ | No log | 66.0 | 132 | 0.0208 |
114
+ | No log | 67.0 | 134 | 0.0154 |
115
+ | No log | 68.0 | 136 | 0.0036 |
116
+ | No log | 69.0 | 138 | 0.0021 |
117
+ | No log | 70.0 | 140 | 0.0011 |
118
+ | No log | 71.0 | 142 | 0.0011 |
119
+ | No log | 72.0 | 144 | 0.0023 |
120
+ | No log | 73.0 | 146 | 0.0024 |
121
+ | No log | 74.0 | 148 | 0.0014 |
122
+ | No log | 75.0 | 150 | 0.0005 |
123
+ | No log | 76.0 | 152 | 0.0003 |
124
+ | No log | 77.0 | 154 | 0.0003 |
125
+ | No log | 78.0 | 156 | 0.0002 |
126
+ | No log | 79.0 | 158 | 0.0001 |
127
+ | No log | 80.0 | 160 | 0.0001 |
128
+ | No log | 81.0 | 162 | 0.0001 |
129
+ | No log | 82.0 | 164 | 0.0001 |
130
+ | No log | 83.0 | 166 | 0.0001 |
131
+ | No log | 84.0 | 168 | 0.0001 |
132
+ | No log | 85.0 | 170 | 0.0001 |
133
+ | No log | 86.0 | 172 | 0.0001 |
134
+ | No log | 87.0 | 174 | 0.0001 |
135
+ | No log | 88.0 | 176 | 0.0001 |
136
+ | No log | 89.0 | 178 | 0.0001 |
137
+ | No log | 90.0 | 180 | 0.0001 |
138
+ | No log | 91.0 | 182 | 0.0001 |
139
+ | No log | 92.0 | 184 | 0.0001 |
140
+ | No log | 93.0 | 186 | 0.0001 |
141
+ | No log | 94.0 | 188 | 0.0001 |
142
+ | No log | 95.0 | 190 | 0.0001 |
143
+ | No log | 96.0 | 192 | 0.0001 |
144
+ | No log | 97.0 | 194 | 0.0001 |
145
+ | No log | 98.0 | 196 | 0.0001 |
146
+ | No log | 99.0 | 198 | 0.0001 |
147
+ | No log | 100.0 | 200 | 0.0001 |
148
+
149
+
150
+ ### Framework versions
151
+
152
+ - Transformers 4.33.3
153
+ - Pytorch 2.0.1+cu118
154
+ - Datasets 2.14.5
155
+ - Tokenizers 0.13.3
config.json CHANGED
@@ -52,7 +52,7 @@
52
  "static_position_embeddings": false,
53
  "tokenizer_class": "MBart50Tokenizer",
54
  "torch_dtype": "float32",
55
- "transformers_version": "4.33.2",
56
  "use_cache": true,
57
  "vocab_size": 250054
58
  }
 
52
  "static_position_embeddings": false,
53
  "tokenizer_class": "MBart50Tokenizer",
54
  "torch_dtype": "float32",
55
+ "transformers_version": "4.33.3",
56
  "use_cache": true,
57
  "vocab_size": 250054
58
  }
generation_config.json CHANGED
@@ -8,5 +8,5 @@
8
  "max_length": 200,
9
  "num_beams": 5,
10
  "pad_token_id": 1,
11
- "transformers_version": "4.33.2"
12
  }
 
8
  "max_length": 200,
9
  "num_beams": 5,
10
  "pad_token_id": 1,
11
+ "transformers_version": "4.33.3"
12
  }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17a2f530a74f3cda932f2f8f0904ae85cd86c03d29effa28986a1fa69ed47b63
3
+ size 4219