agentlans commited on
Commit
1d96d64
·
verified ·
1 Parent(s): b1e4a60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -65
README.md CHANGED
@@ -8,28 +8,27 @@ base_model: Helsinki-NLP/opus-mt-zh-en
8
  tags:
9
  - generated_from_trainer
10
  datasets:
11
- - agentlans/en-zhtw
12
  pipeline_tag: translation
13
  ---
14
 
15
  <details>
16
  <summary>Traditional Chinese-to-English Translator</summary>
17
 
18
- This model translates Traditional Chinese sentences into English, specifically optimized to understand Taiwanese-style Traditional Chinese and deliver more accurate English translations.
19
 
20
- It is a fine-tuned version of the [Helsinki-NLP/opus-mt-zh-en](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) model, trained on the [agentlans/en-zhtw](https://huggingface.co/datasets/agentlans/en-zhtw) dataset.
21
 
22
  ## Intended Uses & Limitations
23
 
24
  ### Intended Use Cases
25
 
26
  - Translating individual sentences from Traditional Chinese to English.
27
- - Applications requiring nuanced understanding of Taiwanese-style Traditional Chinese.
28
 
29
  ### Limitations
30
 
31
  - Optimized for single-sentence translation; performance may degrade on longer texts without appropriate segmentation.
32
- - Chinese personal names are romanized using the Giles-Wade system commonly adopted in Taiwan.
33
  - May struggle with Taiwanese slang, idioms, and newly emerging expressions.
34
  - Can occasionally produce incomprehensible English due to challenges with anaphora and sentence structure differences.
35
  - Specificity issues may occur; for example, the Chinese term for "outpost" might be mistranslated as "post office," or "fur trader" as "leather dealer."
@@ -39,21 +38,20 @@ It is a fine-tuned version of the [Helsinki-NLP/opus-mt-zh-en](https://huggingfa
39
  <details>
40
  <summary>繁體中文至英文翻譯模型</summary>
41
 
42
- 該模型將繁體中文翻譯成英文,特別針對台灣風格的語言特徵進行優化,能夠提供更準確且自然的英文翻譯。
43
 
44
- 它是 [Helsinki-NLP/opus-mt-zh-en](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) 模型的微調版本,使用 [agentlans/en-zhtw](https://huggingface.co/datasets/agentlans/en-zhtw) 資料集進行訓練。
45
 
46
  ## 預期用途與限制
47
 
48
  ### 預期用途
49
 
50
  * 將單句繁體中文翻譯成英文
51
- * 適用於需要細緻理解台灣用語和語氣的應用場景
52
 
53
  ### 限制
54
 
55
  * 模型針對單句翻譯進行最佳化;若處理長文但未妥善切句,可能影響翻譯品質
56
- * 中文人名羅馬化主要依據台灣常用的 Giles-Wade 系統
57
  * 對台灣俚語、成語或新興用語的理解能力有限
58
  * 偶爾因語序與結構差異,產生不自然或難以理解的英文句子
59
  * 可能出現語意偏差
@@ -74,85 +72,40 @@ translator("《阿奇大戰鐵血戰士》是2015年4至7月黑馬漫畫和阿
74
 
75
  # 輸出
76
  # Output
77
- # The Achilles of the Battle of the Iron Man was a four-year-long series of cartoons published in the US from April to July 2015 by Alex de Kampi and by Fernando Luiss, a multinational work.
78
 
79
  # 與我自己的黃金標準翻譯比較:
80
  # Compare with my own gold standard translation:
81
  # "Archie vs. Predator" is a limited four-issue comic book series published by Black Horse and Archie Comics in the United States from April to July 2015. It was created by Alex de Campi and drawn by Fernando Ruiz. It's a crossover work.
82
  ```
83
 
84
- ## More examples / 更多範例
85
-
86
- <details>
87
- <summary>Click here / 點這裡</summary>
88
-
89
- ```
90
- Chinese: 地政局長陳淑美表示,南科是帶動臺南產業發展的引擎,目前積極辦理相關園區用地作業,可提供產業專用地面積總計約 122 公頃,配合周邊開發也已規畫完善足夠的生活機能用地,共 318.45 公頃。
91
- English: Chen Shu-mei, director of the Department of Land Affairs, says that STSIP is the engine driving the development of Tainan's industry. Currently it is actively handling the use of land in related parks, and can provide a total of 122 hectares of specialized land for commercial use. In conjunction with the development of the surrounding areas, STSIP has already mapped out an adequate amount of operational land for use in Tainan.
92
-
93
- Chinese: 那些屍體可能是屬於俄羅斯皮貨商和來自舊金山以北不遠的俄羅斯哨所羅斯堡。
94
- English: Those corpses may have belonged to Russian leather dealers and to Russian post office Losburg, not far north of San Francisco.
95
-
96
- Chinese: 過去沒有人想過要知道田裡有多少福壽螺,但要評估陷阱的效率,就得先知道族羣量等資訊,實驗就在今年一期稻作時展開。
97
- English: In the past, no one had thought about how many fortune snails there were in the fields. But to evaluate the effectiveness of the traps, one had to first learn about the numbers of the tribes, and the experiment was carried out during this year's rice harvest.
98
-
99
- Chinese: 1924年第二次直奉戰爭爆發,姜任鎮威軍第1軍軍長。
100
- English: In 1924, the second direct war broke out, with Chiang serving as head of the 1st Division of the Zhenwei Army.
101
-
102
- Chinese: 1984年畢業於列寧格勒蘇聯貿易學院,並獲得經濟學學位。
103
- English: He graduated from the Leningrad Institute of Trade in 1984, and earned a degree in economics.
104
-
105
- Chinese: 1842年直布羅陀教區成立,它升為主教座堂,首任主教喬治·湯姆林森。
106
- English: Established in 1842, the Cathedral of Hippocrats was raised to the rank of bishop, the first of its kind, George Tom Linson.
107
-
108
- Chinese: 世大運女子籃球金牌戰中國對戰日本,中國隊球員韓旭 在比賽中防守。
109
- English: The women's basketball team competed against Japan in the gold medal, and Han Hsu, a Chinese player, stood guard in the game.
110
-
111
- Chinese: 地方政府將建築工地的監督權責全推給監造人,有道理嗎?
112
- English: Is it plausible that local governments place the responsibility for overseeing construction sites entirely on the architects?
113
-
114
- Chinese: 埃及媒體 報導,埃及軍事生產部 25 日與日本一家公司簽署協議,當地量產可由空氣造水的設備。
115
- English: Egyptian media reported that the Egyptian Ministry of Military Production had signed a 25-day agreement with a Japanese company to produce air-powered water in bulk.
116
-
117
- Chinese: 每年的降水量大部分發生在3月初到6月底之間。
118
- English: Most annual precipitation occurs between the beginning of March and the end of June.
119
- ```
120
-
121
- </details>
122
-
123
  ## Training procedure / 訓練過程
124
 
125
  <details>
126
  <summary>Click here / 點這裡</summary>
127
 
128
- ### Training Hyperparameters
129
 
130
  The following hyperparameters were used during training:
131
-
132
  - learning_rate: 5e-05
133
  - train_batch_size: 8
134
  - eval_batch_size: 8
135
  - seed: 42
136
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
137
  - lr_scheduler_type: linear
138
- - num_epochs: 10.0
139
 
140
- ### Training Results
141
 
142
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
143
  |:-------------:|:-----:|:------:|:---------------:|:-----------------:|
144
- | 2.1687 | 1.0 | 41656 | 2.0160 | 22482354 |
145
- | 2.0294 | 2.0 | 83312 | 1.9505 | 44979297 |
146
- | 1.8953 | 3.0 | 124968 | 1.9115 | 67451077 |
147
- | 1.7876 | 4.0 | 166624 | 1.8912 | 89955187 |
148
- | 1.7266 | 5.0 | 208280 | 1.8687 | 112427069 |
149
- | 1.6689 | 6.0 | 249936 | 1.8629 | 134906493 |
150
- | 1.583 | 7.0 | 291592 | 1.8540 | 157415170 |
151
- | 1.5274 | 8.0 | 333248 | 1.8504 | 179896969 |
152
- | 1.4662 | 9.0 | 374904 | 1.8432 | 202377935 |
153
- | 1.3969 | 10.0 | 416560 | 1.8426 | 224864981 |
154
-
155
- ### Framework Versions
156
 
157
  - Transformers 4.51.3
158
  - Pytorch 2.6.0+cu124
 
8
  tags:
9
  - generated_from_trainer
10
  datasets:
11
+ - agentlans/en-zhtw-google-translate
12
  pipeline_tag: translation
13
  ---
14
 
15
  <details>
16
  <summary>Traditional Chinese-to-English Translator</summary>
17
 
18
+ This model translates Traditional Chinese sentences into English.
19
 
20
+ It is a fine-tuned version of the [Helsinki-NLP/opus-mt-zh-en](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) model, trained on the [agentlans/en-zhtw-google-translate](https://huggingface.co/datasets/agentlans/en-zhtw-google-translate) dataset.
21
 
22
  ## Intended Uses & Limitations
23
 
24
  ### Intended Use Cases
25
 
26
  - Translating individual sentences from Traditional Chinese to English.
27
+ - Applications requiring understanding of Taiwanese-style Traditional Chinese.
28
 
29
  ### Limitations
30
 
31
  - Optimized for single-sentence translation; performance may degrade on longer texts without appropriate segmentation.
 
32
  - May struggle with Taiwanese slang, idioms, and newly emerging expressions.
33
  - Can occasionally produce incomprehensible English due to challenges with anaphora and sentence structure differences.
34
  - Specificity issues may occur; for example, the Chinese term for "outpost" might be mistranslated as "post office," or "fur trader" as "leather dealer."
 
38
  <details>
39
  <summary>繁體中文至英文翻譯模型</summary>
40
 
41
+ 該模型將繁體中文翻譯成英文。
42
 
43
+ 它是 [Helsinki-NLP/opus-mt-zh-en](https://huggingface.co/Helsinki-NLP/opus-mt-zh-en) 模型的微調版本,使用 [agentlans/en-zhtw-google-translate](https://huggingface.co/datasets/agentlans/en-zhtw-google-translate) 資料集進行訓練。
44
 
45
  ## 預期用途與限制
46
 
47
  ### 預期用途
48
 
49
  * 將單句繁體中文翻譯成英文
50
+ * 適用於需要理解台灣用語和語氣的應用場景
51
 
52
  ### 限制
53
 
54
  * 模型針對單句翻譯進行最佳化;若處理長文但未妥善切句,可能影響翻譯品質
 
55
  * 對台灣俚語、成語或新興用語的理解能力有限
56
  * 偶爾因語序與結構差異,產生不自然或難以理解的英文句子
57
  * 可能出現語意偏差
 
72
 
73
  # 輸出
74
  # Output
75
+ # The Iron Blood Warriors of the Achilles War is a four-term, serial comic book on black horse comics and Achilles comics released in the United States from April to July 2015. It was created by Alex De Campi and illustrated by Fernando Ruiz. It is a cross-corporate cross-border work.
76
 
77
  # 與我自己的黃金標準翻譯比較:
78
  # Compare with my own gold standard translation:
79
  # "Archie vs. Predator" is a limited four-issue comic book series published by Black Horse and Archie Comics in the United States from April to July 2015. It was created by Alex de Campi and drawn by Fernando Ruiz. It's a crossover work.
80
  ```
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ## Training procedure / 訓練過程
83
 
84
  <details>
85
  <summary>Click here / 點這裡</summary>
86
 
87
+ ### Training hyperparameters
88
 
89
  The following hyperparameters were used during training:
 
90
  - learning_rate: 5e-05
91
  - train_batch_size: 8
92
  - eval_batch_size: 8
93
  - seed: 42
94
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
95
  - lr_scheduler_type: linear
96
+ - num_epochs: 5.0
97
 
98
+ ### Training results
99
 
100
  | Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
101
  |:-------------:|:-----:|:------:|:---------------:|:-----------------:|
102
+ | 1.3772 | 1.0 | 99952 | 1.2156 | 54090088 |
103
+ | 1.2001 | 2.0 | 199904 | 1.1147 | 108157960 |
104
+ | 1.0933 | 3.0 | 299856 | 1.0592 | 162248288 |
105
+ | 0.9897 | 4.0 | 399808 | 1.0107 | 216341560 |
106
+ | 0.9016 | 5.0 | 499760 | 0.9878 | 270444104 |
107
+
108
+ ### Framework versions
 
 
 
 
 
109
 
110
  - Transformers 4.51.3
111
  - Pytorch 2.6.0+cu124