File size: 43,833 Bytes
06c89da 86b66f0 755991a 86b66f0 755991a 1600cfe 06c89da cad50b1 a18998c cd9f0b8 aa7768e 86b66f0 668e303 86b66f0 6f2324f 3b2dbe2 cd9f0b8 6f2324f 3d93f65 cd9f0b8 3d93f65 cd9f0b8 755991a 06c89da fcd1a78 06c89da 755991a 06c89da d3751b0 06c89da aa7768e 3bcafcd 336d13a 3bcafcd 11be5d4 06c89da 668e303 dfc79c6 06c89da 755991a 06c89da 4ab4804 06c89da a34415a 755991a a34415a c7f0cc8 06c89da 0f0cef0 c7f0cc8 0f0cef0 c7f0cc8 668e303 b53ed6b 755991a 06c89da a34415a 755991a 06c89da 336d13a 755991a 06c89da bfef67e 755991a cd9f0b8 06c89da 755991a 3fa2797 06c89da 755991a 3fa2797 ccce33d 755991a cd9f0b8 755991a ccce33d cd9f0b8 641542b 755991a cd9f0b8 755991a cd9f0b8 06c89da 641542b 755991a 06c89da cd9f0b8 3d93f65 cd9f0b8 641542b be4db5b 641542b be4db5b 641542b cd9f0b8 641542b 66451d4 641542b 66451d4 b53ed6b cd9f0b8 ffc9b0b bb9ec2c cd9f0b8 3b2dbe2 cd9f0b8 3b2dbe2 cd9f0b8 be4db5b cd9f0b8 3b2dbe2 cd9f0b8 3b2dbe2 38eea11 cd9f0b8 3d93f65 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 d4bbde9 cd9f0b8 d4bbde9 cd9f0b8 3d93f65 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 641542b cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 3d93f65 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 4ab4804 cd9f0b8 4ab4804 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 3d93f65 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 66451d4 cd9f0b8 cad50b1 cd9f0b8 ffc9b0b cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 cad50b1 cd9f0b8 d4bbde9 cd9f0b8 3d93f65 cd9f0b8 668e303 cd9f0b8 cad50b1 cd9f0b8 3fa2797 cd9f0b8 3fa2797 cd9f0b8 3fa2797 cd9f0b8 b53ed6b cd9f0b8 cad50b1 cd9f0b8 3fa2797 cd9f0b8 3fa2797 cad50b1 3fa2797 cd9f0b8 8966cff e1c6b2d 755991a 06c89da b17e0cb 06c89da c7f0cc8 06c89da d1cc8d7 c7f0cc8 8a5bd51 06c89da 755991a 06c89da 755991a 06c89da a34415a 06c89da c7f0cc8 06c89da d1cc8d7 06c89da 755991a c7f0cc8 755991a 86b66f0 06c89da 755991a be4db5b 86b66f0 06c89da 755991a 06c89da 755991a 06c89da 755991a 06c89da 1600cfe |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 |
---
library_name: peft
base_model: unsloth/gemma-2-9b-it-bnb-4bit
language:
- ja
- en
tags:
- translation
- qlora
- gemma2
- text-generation-inference
- nlp
license: apache-2.0
---
![image/png](c3tr-logo.png)
# News
## 2024.07.20
C3TR-AdapterのVersion3を公開しました。
Version 3 of C3TR-Adapter has been released.
version3では4つのベンチマークのうち、1つでgpt4 turboを上回るという大幅な性能底上げが達成されています。
Version 3 achieved a significant performance boost, beating GPT4 Turbo in one of the four benchmarks.
## 2024.05.17
[C3TR-AdapterのVersion2](https://huggingface.co/webbigdata/C3TR-Adapter/tree/version2)を公開しました。
[Version 2 of C3TR-Adapter](https://huggingface.co/webbigdata/C3TR-Adapter/tree/version2) has been released.
Version2では主にカジュアルな会話に関する翻訳能力が大幅に向上しています。
Version 2 has greatly improved the ability to translate casual conversations.
その反面、フォーマルな文章の翻訳能力が少し落ちてしまっています。フォーマルな文章を対象にする場合、[Version1](https://huggingface.co/webbigdata/C3TR-Adapter/tree/version1)を引き続きお使いください
On the other hand, translation capabilities for formal texts have declined slightly. If you are targeting formal texts, please continue to use [Version1](https://huggingface.co/webbigdata/C3TR-Adapter/tree/version1).
# モデルカード(Model Card for Model ID)
C3TR-AdapterはGoogleが発表したLLMであるgemma-2-9bの日英・英日翻訳性能を向上させるQLoRA Adapterです。
C3TR-Adapter is a QLoRA Adapter that improves the Japanese-English and English-Japanese translation performance of gemma-2-9b released by Google.
## モデル詳細(Model Details)
C3TR-Adapterは翻訳ベンチマークで多言語翻訳モデルであるGoogleのMadlad400やmetaのSeamless m4t v2 large、[ALMA-Ja-V2](https://huggingface.co/webbigdata/ALMA-7B-Ja-V2) (私達の以前のllama 2ベースのモデル)よりも大幅に優れた日英・日英翻訳性能を持っています。
Benchmarks show significantly better English-Japanese and Japanese-English translation performance than Google's Madlad400, META's Seamless m4t v2 large, and [ALMA-Ja-V2](https://huggingface.co/webbigdata/ALMA-7B-Ja-V2) (our previous llama2 model).
![image/png](c3tr-version3.png)
翻訳タスクに関しては、より大きなモデルに負けない性能を発揮します
元の画像クレジット Sebastian Ruder(@seb_ruder)
For translation tasks, it performs as well as larger models.
Original image credit: Sebastian Ruder (@seb_ruder)
翻訳ベンチマークの実行方法やその他のベンチマーク結果については[JTransBench](https://github.com/webbigdata-jp/JTransBench)を参考にしてください。
For instructions on how to run the translation benchmark and other benchmark results, please refer to [JTransBench](https://github.com/webbigdata-jp/JTransBench).
GoogleのウェブサービスColabを使うと無料でC3TR-Adapterを試す事が出来ます。リンク先でOpen In Colabボタンを押して起動してください。
You can try C3TR-Adapter for free using Google's web service Colab. Please press the Open In Colab button on the link to activate it.
- [動作確認用の簡単なサンプル(A simple sample to check the operation)](https://github.com/webbigdata-jp/python_sample/blob/main/C3TR_Adapter_v3_Japanese_English_Translation_sample_code.ipynb)
- [テキストファイルを一括で日英・英日翻訳するサンプル(Sample of batch translation of text files)](https://github.com/webbigdata-jp/python_sample/blob/main/C3TR_Adapter_v3_batch_translation_sample.ipynb)
- [GPUがない環境でも動かす事ができるgguf版(A gguf version that can be run in environments without a GPU)](https://huggingface.co/webbigdata/C3TR-Adapter_gguf)
### モデルの動かし方(How to use Model)
自分のパソコンで動かす場合は、少なくとも約8.3GB以上のGPU RAMが必要です。GPUメモリが足りない場合は上記のgguf版を試すか、パラメーターを調整してください(max_length、max_new_tokens, num_beamsを減らす)
If you want to run it on your own local computer, you will need at least approximately 8.3 GB or more of GPU RAM.If you do not have enough GPU memory, try the gguf version above or decrease parameters(max_length、max_new_tokens, num_beams).
必要なライブラリのインストール(Installation of required libraries)
```
# もし、pytorchがまだインストールされていなかったら公式マニュアルを参考にインストールしてください
# If pytorch is not already installed, please refer to the official manual to install it.
# https://pytorch.org/get-started/locally/#start-locally
# example for linux user with CUDA 12.1.
# pip3 install torch torchvision torchaudio
# example for windows user with CUDA 12.1.
# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# Gemma 2は最新のライブラリでなくては動かないので、以下のVersionに更新してください
# Gemma 2 will not work without the latest library, so please update to the following version
pip install transformers==4.42.3
pip install peft==0.11.1
pip install bitsandbytes==0.43.1
```
サンプルスクリプト(sample script)
```
import torch
import os
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
model_id = "unsloth/gemma-2-9b-it-bnb-4bit"
peft_model_id = "webbigdata/C3TR-Adapter"
if torch.cuda.is_available() and torch.cuda.get_device_capability(0)[0] >= 8:
dtype = torch.bfloat16
else:
dtype = torch.float16
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, device_map="auto")
model = PeftModel.from_pretrained(model = model, model_id = peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.unk_token
def trans(my_str):
input_ids = tokenizer(my_str, return_tensors="pt",
padding=True, max_length=1800, truncation=True).input_ids.cuda()
# Translation
generated_ids = model.generate(input_ids=input_ids,
max_new_tokens=900, use_cache=True,
do_sample=True, num_beams=3, temperature=0.5, top_p=0.3,
repetition_penalty=1.0
)
full_outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
return full_outputs[0].split("### Response:\n")[-1].strip()
ret = trans("""You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.
<start_of_turn>### Instruction:
Translate Japanese to English.
### Input:
あら?また夜食を食べてるの?
こんにゃくは太りません
<end_of_turn>
<start_of_turn>### Response:
""")
print(ret)
```
### プロンプトフォーマット prompt format
プロンプトフォーマットは独自です。
The prompt format is original.
Version1とVersion2(システムプロンプト追加)とVersion3(```<start_of_turn>```と```<end_of_turn>```追加)ではプロンプトフォーマットも変わっています。
The prompt format has changed between Version 1, Version2(add system prompts) and Version 3(add ```<start_of_turn>```and```<end_of_turn>```).
```
You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.
<start_of_turn>### Instruction:
Translate Japanese to English.
### Input:
**Some Japanese text**
<end_of_turn>
<start_of_turn>### Response:
```
または or
```
You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.
<start_of_turn>### Instruction:
Translate English to Japanese.
### Input:
**Some English text**
<end_of_turn>
<start_of_turn>### Response:
```
プロンプトテンプレート内に余分な空白や改行、```<start_of_turn>```と```<end_of_turn>```の漏れはモデルの誤動作(出力が途切れたり繰り返す、余分な文章が付加される等)に繋がるのでテンプレートにミスがないようにしてください
Extra spaces, line breaks, and omission of ```<start_of_turn>``` or ```<end_of_turn>``` in the prompt template will cause the model to malfunction (output will be truncated or repeated, extra sentences will be added, etc.), so please make sure there are no errors in the template.
Version2からは実験的な試みとして、翻訳時にヒントを与える事が出来るようになっています。
Starting with Version 2, as an experimental attempt, it is now possible to provide hints during translation.
### (1)文体(writing style)
[writing_style: STYLE_NAME]
現在は試験的に11のwriteing styleをテスト実装しています。
We are currently testing 11 writing styles.
casual, formal, technical, journalistic, web-fiction, business, nsfw, educational-casual, academic-presentation, slang, sns-casual
仕事場などではbusinessを使います
In the workplace, we use business.
```
You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating.
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: business]
### Input:
お疲れ様です、本日の資料を送ります。
<end_of_turn>
<start_of_turn>### Response:
Thank you for your hard work today. I am sending today's materials.
```
以降の例ではsystem promptを省略していますが、実際に動かす際にはsystem promptを追加してください。
The following examples omit the system prompt, but be sure to add it when running the commands.
コピペなどではslangやcasualを使います
Use slang or casual language when meme.
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: slang]
[牛鮭定食: Beef salmon set meal]
### Input:
そんな事より >>1 よ、ちょいと聞いてくれよ。スレとあんま関係ないけどさ。
このあいだ、近所の吉野家行ったんです。吉野家。
そしたらなんか人がめちゃくちゃいっぱいで座れないんです。
で、よく見たらなんか垂れ幕下がってて、150円引き、とか書いてあるんです。
もうね、アホかと。馬鹿かと。
お前らな、150円引き如きで普段来てない吉野家に来てんじゃねーよ、ボケが。
150円だよ、150円。
なんか親子連れとかもいるし。一家4人で吉野家か。おめでてーな。
よーしパパ特盛頼んじゃうぞー、とか言ってるの。もう見てらんない。
お前らな、150円やるからその席空けろと。
吉野家ってのはな、もっと殺伐としてるべきなんだよ。
Uの字テーブルの向かいに座った奴といつ喧嘩が始まってもおかしくない、
刺すか刺されるか、そんな雰囲気がいいんじゃねーか。女子供は、すっこんでろ。
で、やっと座れたかと思ったら、隣の奴が、大盛つゆだくで、とか言ってるんです。
そこでまたぶち切れですよ。
あのな、つゆだくなんてきょうび流行んねーんだよ。ボケが。
得意げな顔して何が、つゆだくで、だ。
お前は本当につゆだくを食いたいのかと問いたい。問い詰めたい。小1時間問い詰めたい。
お前、つゆだくって言いたいだけちゃうんかと。
吉野家通の俺から言わせてもらえば今、吉野家通の間での最新流行はやっぱり、
ねぎだく、これだね。
大盛りねぎだくギョク。これが通の頼み方。
ねぎだくってのはねぎが多めに入ってる。そん代わり肉が少なめ。これ。
で、それに大盛りギョク(玉子)。これ最強。
しかしこれを頼むと次から店員にマークされるという危険も伴う、諸刃の剣。
素人にはお薦め出来ない。
まあお前、>>1は、牛鮭定食でも食ってなさいってこった
<end_of_turn>
<start_of_turn>### Response:
Instead of that >>1, hey, just listen for a bit. It's not really related to the thread.
The other day, I went to a Yoshinoya near my place. Yoshinoya.
Then, there were so many people that I couldn't find a seat.
And when I took a closer look, there was a banner hanging down, saying "150 yen off," or something like that.
I mean, what a bunch of idiots.
You guys, don't come to Yoshinoya just for a 150 yen discount, you idiots.
It's only 150 yen, 150 yen.
There were even some families with kids. A family of four at Yoshinoya. Congratulations.
"Let's order the extra-large portion," they say. I can't take it anymore.
You guys, if you're going to spend 150 yen, clear your seat.
Yoshinoya should be a more brutal place.
You should be ready to fight the guy sitting across from you at the U-shaped table at any moment.
A place where you're either stabbing or being stabbed. Women and children, get out of the way.
And just when I finally got a seat, the guy next to me said, "I'll have the extra-large portion with extra sauce."
And I lost it again.
You know, extra sauce isn't popular these days, you idiot.
What are you so proud of, ordering extra sauce?
I want to ask you, do you really want to eat extra sauce? I want to interrogate you for an hour.
You just want to say "extra sauce," don't you?
As a Yoshinoya regular, I can tell you that the latest trend among Yoshinoya regulars is still,
extra onions. That's the way to go.
Extra-large portion with extra onions. That's the regular's order.
Extra onions means more onions and less meat. That's it.
And with that, an extra-large egg. That's the best.
However, ordering this comes with the risk of being marked by the staff. It's a double-edged sword.
I don't recommend it for amateurs.
Well, you, >>1, why don't you just order the beef salmon set meal?
```
#### (2)固有名詞の読み方 How to read proper nouns
[英語名称: 日本語訳] またはその逆。
[English name: Japanese translation name] or vice versa
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: formal]
[羽生結弦: Yuzuru Hanyu]
[羽生善治: Yoshiharu Habu]
### Input:
フィギュアスケートの羽生結弦さんが将棋棋士の羽生善治さんと対談した
<end_of_turn>
<start_of_turn>### Response:
Figure skater Yuzuru Hanyu had a conversation with shogi player Yoshiharu Habu.
```
#### (3)キャラクタースタイル character_style
[XXXX_character_style: YYYY]
キャラクタースタイルで性別や個性を指定する事ができます
You can specify gender and personality in the character style.
男性指定 Male designated
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: formal]
[青山樹_character_style: male]
[青山樹: AOYAMA Itsuki]
### Input:
青山樹は週末に友達とキャンプに行って、自然を楽しんだ。そして時計を紛失した。
<end_of_turn>
<start_of_turn>### Response:
Aoyama Itsuki went camping with his friends on the weekend and enjoyed nature. However, he lost his watch.
```
女性指定 Female designated
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: formal]
[青山樹_character_style: female]
[青山樹: Itsuki Aoyama]
### Input:
青山樹は週末に友達とキャンプに行って、自然を楽しんだ。そして時計を紛失した。
<end_of_turn>
<start_of_turn>### Response:
Itsuki Aoyama went camping with friends on the weekend and enjoyed nature. However, she lost her watch.
```
ノンバイナリー指定 nonbinary designated
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: formal]
[青山樹_character_style: nonbinary]
[青山樹: Tatsuki Aoyama]
### Input:
青山樹は週末に友達とキャンプに行って、自然を楽しんだ。そして時計を紛失した。
<end_of_turn>
<start_of_turn>### Response:
Tatsuki Aoyama went camping with their friends on the weekend and enjoyed nature. They lost their watch.
```
残念ながら現時点では性別の指定は本文の内容が優先されるため、例えば以下の文章では性別指定が有効になりません。
以下の例では本文内の「俺は男だよ!」を消せば性別指定が有効になります。
また、bfloat16が扱えないColabの無料版などではこの指定が無視されてしまうようです。
Unfortunately, at present, the content of the text takes priority when designating gender, so for example, the gender designation will not be effective in the following sentence.
In the example below, if you delete "俺は男だよ!(I'm a guy!)" from the text, the gender specification will be effective.
Also, this specification seems to be ignored in the free version of Colab, which cannot handle bfloat16.
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: web-fiction]
[カミーユ: kamille]
[kamille_character_style: female, rough, teenager]
[ジュリド: Jerid]
[ティターンズ: Titans]
[エマ: Emma]
[エゥーゴ: A.E.U.G.]
### Input:
ジェリド「カミーユ?女の名前なのに・・・何だ、男か。」
カミーユ「なめるな!!」
ジェリド「うわ!」
エマ「やめなさい!」
ジェリド「オレ達をティターンズと知ってちょっかいを出してきたのか?」
カミーユ「カミーユが男の名前で何で悪いんだ!!!俺は男だよ!」
こうして地球連邦のエリート部隊・ティターンズを殴った罪で拘束された後、母を失い、反地球連邦組織『エゥーゴ』に参加しました。
<end_of_turn>
<start_of_turn>### Response:
Jerid: "Kamille? That's a woman's name... What, are you a man?"
Kamille: "Don't underestimate me!!"
Jerid: "Whoa!"
Emma: "Stop it!"
Jerid: "Did you provoke us because you know we're from the Titans?"
Kamille: "What's wrong with my name being Kamille? I'm a man!"
After being arrested for assaulting members of the Earth Federation's elite unit, the Titans, Kamille lost his mother and joined the anti-Earth Federation organization, A.E.U.G.
```
character_styleとwriting_styleを組み合わせる
Combining character_style and writing_style
以下の例では段々と丁寧な言い回しに変化しています
In the following example, the phrase gradually changes to a more polite one.
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: slang]
[speaker_character_style: vulgar]
### Input:
今日の会議は非常に重要ですので、時間通りに来てください。
<end_of_turn>
<start_of_turn>### Response:
Today's meeting is super important, so show up on time.
```
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: casual]
[speaker_character_style: rough]
### Input:
今日の会議は非常に重要ですので、時間通りに来てください。
<end_of_turn>
<start_of_turn>### Response:
Today's meeting is very important, so please come on time.
```
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: formal]
[speaker_character_style: noble]
### Input:
今日の会議は非常に重要ですので、時間通りに来てください。
<end_of_turn>
<start_of_turn>### Response:
Today's meeting is of great importance, so please come on time.
```
日本語でも同様に丁寧になっていっています
The Japanese language is also becoming more polite.
```
<start_of_turn>### Instruction:
Translate English to Japanese.
When translating, please use the following hints:
[writing_style: slang]
[speaker_character_style: vulgar]
### Input:
Since today's meeting is very important, please arrive on time.
<end_of_turn>
<start_of_turn>### Response:
今日の会議は非常に重要なので、時間厳守で来てください。
```
```
<start_of_turn>### Instruction:
Translate English to Japanese.
When translating, please use the following hints:
[writing_style: casual]
[speaker_character_style: rough]
### Input:
Since today's meeting is very important, please arrive on time.
<end_of_turn>
<start_of_turn>### Response:
今日の会議は超重要だから、時間厳守で来い。
```
```
<start_of_turn>### Instruction:
Translate English to Japanese.
When translating, please use the following hints:
[writing_style: formal]
[speaker_character_style: noble]
### Input:
Since today's meeting is very important, please arrive on time.
<end_of_turn>
<start_of_turn>### Response:
今日の会議は非常に重要なので、時間厳守で参列してください。
```
#### (4)一人称と語尾(First person and ending)
キャラクターの一人称と語尾を指定する事ができます。
この機能はまだ非常に実験的な機能であり、現時点では不完全です。
You can specify the first person and ending for your character.
This feature is still highly experimental and incomplete at this time.
映画「her」より一場面(A scene from the movie "her")
```
<start_of_turn>### Instruction:
Translate English to Japanese.
When translating, please use the following hints:
[writing_style: web-fiction, nsfw]
[THEODORE: セオドア]
[SEXYKITTEN: セクシーな子猫]
[SEXYKITTEN_character_style: female]
[THEODORE_character_style: male]
[SEXYKITTEN_first_person_and_ending: 私, だわ]
[THEODORE_first_person_and_ending: 僕, だよ]
### Input:
SEXYKITTEN:Choke me with that dead cat!
THEODORE:(breathing hard about to climax) What?
SEXYKITTEN:(fully into it) The dead cat next to the bed. Choke me with it!
Beat. He's taken out of it.
THEODORE:(uncomfortable, trying to play along) Um, okay.
SEXYKITTEN:Tell me.
THEODORE:I'm choking you with the cat.
SEXYKITTEN:TELL ME! Keep telling me!
THEODORE:I've got it's tail and I'm choking you with the cat's tail.
SEXYKITTEN:YEAH, YOU ARE! FUCK! TELL ME!
THEODORE:I'm choking you and it's tail is around your neck. It's so tight around your neck.
SEXYKITTEN:YES! YES!
Theodore doesn't know what to say. He doesn't want to offend her.
THEODORE:I'm pulling it. I'm pulling it. The cat's dead. It's a dead cat around your neck and I'm pulling it.
SEXYKITTEN:AHHHHHHHHHHHH. OH MY GOD!
Her breathing is slowing down.
SEXYKITTEN:Oh god, I came so hard.
THEODORE:Yeah. Me too.
SEXYKITTEN:Okay, good night.
Theodore takes his earpiece out and stares at the ceiling.
### Response:
セクシーな子猫:その死んだ猫で私を絞めつけろ!
セオドア:(息が荒く、快楽に近づく) 何?
セクシーな子猫:(夢中になっている) ベッドの横の死んだ猫だ。それを私を絞めつけろ!
沈黙。彼はその場から引き離される。
セオドア:(不快に、でも付き合おうとする) うん、わかった。
セクシーな子猫:言ってみろ。
セオドア:猫で君を絞めている。
セクシーな子猫:言ってみろ!もっと言ってみろ!
セオドア:猫の尻尾を掴んで君を絞めている。
セクシーな子猫:そうだ、そうなんだ!くそっ!言ってみろ!
セオドア:猫で君を絞めていて、その尻尾が君の首に巻かれている。君の首にすごくきつく巻かれている。
セクシーな子猫:ああ!ああ!
セオドアは何を言ったらいいのかわからない。彼女を傷つけたくない。
セオドア:引っ張っている。引っ張っている。猫は死んでいる。君の首に死んだ猫が巻かれていて、それを引っ張っている。
セクシーな子猫:ああああああああ。ああ、神様!
彼女の呼吸は遅くなっている。
セクシーな子猫:ああ、神様、すごく気持ちよかった。
セオドア:うん。僕も同じだ。
セクシーな子猫:わかった、おやすみ。
セオドアはイヤホンを外して天井を見つめる。
```
漫画「鬼滅の刃」より一場面(A scene from the manga "Demon Slayer")
```
<start_of_turn>### Instruction:
Translate Japanese to English.
When translating, please use the following hints:
[writing_style: web-fiction]
[釜鵺: Kamanue]
[零余子: Mukago]
[鬼舞辻 無惨: Muzan Kibutsuzi]
[病葉: Wakuraba]
[累: Rui]
[十二鬼月: the Twelve Kizuki]
[魘夢: Enmu]
[轆轤: Rokuro]
### Input:
釜鵺: 『無惨様だ、無惨様の声。わからなかった。姿も気配も以前と違う。凄まじい精度の擬態。』
零余子: 「も、申し訳ございません。お姿も気配も異なっていらしたので。」
無惨:「誰が喋って良いと言った。貴様共の下らぬ意志で物を言うな、私に聞かれたことにのみ答えよ。累が殺された。下弦の伍だ。私が問いたいのは一つのみ、何故に下弦の鬼はそれ程までに弱いのか。十二鬼月に数えられたからと言ってそこで終わりではない、そこから始まりだ。より人を喰らい、より強くなり、私の役に立つための始まり。ここ百年余十二鬼月の上限は顔ぶれが変わらない。鬼狩りの柱共を葬ってきたのは常に上弦の鬼たちだ。しかし下弦はどうか、何度入れ替わった。」
釜鵺: 『そんなことを俺達に言われても。』
無惨: 「そんなことを俺達に言われても、何だ、言ってみろ。」
釜鵺: 『思考が読めるのか、まずい。』
無惨: 「何がまずい、、、言ってみろ。。」
釜鵺: 「お許しくださいませ鬼舞辻様。どうか、どうかお慈悲を。申し訳ありません、申し訳ありません。申し訳あ、、、ひゃぁ、、、。」
無惨、釜鵺を手にかける
病葉: 『何でこんなことで、殺されるのか。 せっかく十二鬼月になれたのに、なぜだ、なぜだ。俺はこれから、もっと、もっと。」
無惨: 「私よりも鬼狩りの方が怖いか。」
零余子: 「いいえ。」
無惨: 「お前はいつも鬼狩りの柱と遭遇した場合、逃亡しようと思っているな。」
零余子: 「いいえ思っていません。私はあなた様の為に命をかけて戦います。」
無惨: 「お前は私が言うことを否定するのか。」
無惨、零余子を手にかける
病葉: 『ダメだ、お終いだ。思考は読まれ、肯定しても否定しても殺される。戦って勝てるはずもない。なら、逃げるしか!』
魘夢: 「愚かだな~。」
病葉: 『何とか逃げ切れ、何とか。これだけ離れれば。』
無惨、病葉を手にかける
無惨: 「もはや十二鬼月は上弦のみで良いと思っている。下弦の鬼は解体する。!」
病葉: 『やられている?そんな。琵琶の女の能力か、いや、琵琶の音はしなかった。ぐぅぅ何故だ、体を再生できない。」
無惨: 「最後に何か言い残すことは。」
轆轤: 「私はまだお役に立てます!もう少しだけご猶予を頂けるのならば必ずお役に。」
無惨: 「具体的にどれほどの猶予を。お前はどの様な役に立てる。今のお前の力でどれほどの事ができる。」
轆轤: 「血を、貴方様の血を分けていただければ私は必ず血に順応してみせます。より強力な鬼となり戦います。」
無惨: 「何故私がお前の指図で血を与えねばならんのだ。甚だ図々しい、身の程をわきまえろ。」
轆轤: 「違います、違います、私は。」
無惨: 「黙れ。何も違わない。私は何も間違えない。全ての決定権は私にあり、私の言うことは絶対である。お前に拒否する権利はない、私が正しいと言ったことが正しいのだ。お前は私に指図した。死に値する。」
無惨、轆轤を手にかける
無惨: 「最後に言い残すことは。」
病葉: 「こいつも殺される。この方の気分次第で全て決まる。俺ももう死ぬ。」
<end_of_turn>
<start_of_turn>### Response:
Kamanue: "It's Muzan-sama, Muzan-sama's voice. I couldn't tell. His appearance and presence are different from before. An incredibly precise mimicry."
Mukago: "I-I apologize. His appearance and presence were different."
Muzan: "Who said you could speak? Don't speak with your lowly intentions. Answer only to what I ask. Rui was killed. He was the Lower Five. I have only one question: why are the Lower Rank demons so weak? Just because you're counted among the Twelve Kizuki doesn't mean it ends there. It's the beginning. The beginning of eating more people, becoming stronger, and being useful to me. For the past hundred years, the faces of the Twelve Kizuki have remained unchanged. It's always been the Upper Rank demons who have killed the Hashira. But what about the Lower Rank? How many times have they been replaced?"
Kamanue: "What are you saying to us?"
Muzan: "What are you saying to us? Go on, say it."
Kamanue: "My thoughts are being read. This is bad."
Muzan: "What's bad... Go on, say it."
Kamanue: "Forgive me, Muzan-sama. Please, please have mercy. I'm sorry, I'm sorry. I'm sorry..."
Muzan kills Kamanue
Wakuraba: "Why am I being killed for this? Just because I became a member of the Twelve Kizuki, why? Why?"
Muzan: "Are you more afraid of the Demon Slayers than me?"
Mukago: "No."
Muzan: "You're always thinking of running away when you encounter a Hashira, aren't you?"
Mukago: "No, I'm not. I will fight to the death for you."
Muzan: "You're contradicting me?"
Muzan kills Mukago
Wakuraba: "It's no use, it's over. My thoughts are being read, and I'll be killed whether I agree or disagree. There's no way I can win in battle. Then, the only option is to run!"
Enmu: "Foolish."
Wakuraba: "I have to escape somehow. If I just get this far away..."
Muzan kills Wakuraba
Muzan: "From now on, the Twelve Kizuki will only consist of the Upper Ranks. The Lower Rank demons will be disbanded!"
Wakuraba: "Am I being defeated? No, it's not like that. It must be the woman with the biwa's ability. No, I didn't hear the sound of the biwa. Why can't I regenerate my body?"
Muzan: "Is there anything else you want to say?"
Rokuro: "I can still be of use! If you give me a little more time, I will definitely be of use."
Muzan: "How much time exactly? What kind of use can you be? What can you do with your current power?"
Rokuro: "If you give me some of your blood, I will definitely adapt to it. I will become a stronger demon and fight."
Muzan: "Why should I give you my blood on your command? It's very presumptuous of you. Know your place."
Rokuro: "No, no, I..."
Muzan: "Shut up. Nothing will change. I never make mistakes. All the decision-making power lies with me, and what I say is absolute. You have no right to refuse. What I say is right. You commanded me. You deserve to die."
Muzan kills Rokuro
Muzan: "Is there anything else you want to say?"
Wakuraba: "This guy is going to be killed too. Everything depends on this guy's mood. I'm going to die too."
```
## SpeedUp Sample
unslothを使う事で精度をわずかに犠牲にして実行速度を上げる事ができます。
Using unsloth can increase execution speed at the expense of a small amount of accuracy.
```
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.43.3
pip install bitsandbytes==0.43.3
pip install accelerate==0.33.0
pip install peft==0.12.0
pip install flash-attn --no-build-isolation
pip install --upgrade pip
python -m pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"
```
```
import time
import torch
max_seq_length = 2048
load_in_4bit = True
dtype=torch.bfloat16
from unsloth import FastLanguageModel
adp_name = "webbigdata/C3TR-Adapter"
from transformers import TextStreamer
model_name = "unsloth/gemma-2-9b-it"
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
model, tokenizer = FastLanguageModel.from_pretrained(
adp_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)
def trans(instruction, input):
system = """You are a highly skilled professional Japanese-English and English-Japanese translator. Translate the given text accurately, taking into account the context and specific instructions provided. Steps may include hints enclosed in square brackets [] with the key and value separated by a colon:. Only when the subject is specified in the Japanese sentence, the subject will be added when translating into English. If no additional instructions or context are provided, use your expertise to consider what the most appropriate context is and provide a natural translation that aligns with that context. When translating, strive to faithfully reflect the meaning and tone of the original text, pay attention to cultural nuances and differences in language usage, and ensure that the translation is grammatically correct and easy to read. After completing the translation, review it once more to check for errors or unnatural expressions. For technical terms and proper nouns, either leave them in the original language or use appropriate translations as necessary. Take a deep breath, calm down, and start translating."""
prompt = f"""{system}
<start_of_turn>### Instruction:
{instruction}
### Input:
{input}
<end_of_turn>
<start_of_turn>### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt",
padding=True, max_length=2400, truncation=True).to("cuda")
from transformers import TextStreamer
class CountingStreamer(TextStreamer):
def __init__(self, tokenizer):
super().__init__(tokenizer)
self.tokenizer = tokenizer
self.token_count = 0
def put(self, text):
self.token_count += len(self.tokenizer.encode(text, add_special_tokens=False))
super().put(text)
def put(self, text):
if isinstance(text, torch.Tensor):
self.token_count += text.shape[-1]
elif isinstance(text, list):
self.token_count += len(text)
elif isinstance(text, str):
self.token_count += len(self.tokenizer.encode(text, add_special_tokens=False))
else:
raise TypeError(f"Unexpected type for text: {type(text)}")
super().put(text)
counting_streamer = CountingStreamer(tokenizer)
start_time = time.time()
_ = model.generate(**inputs, streamer = counting_streamer, max_new_tokens=2400,
#min_length=1000,
early_stopping=False)
end_time = time.time()
elapsed_time = end_time - start_time
generated_tokens = counting_streamer.token_count
tokens_per_second = generated_tokens / elapsed_time
print(f"generated_tokens: {generated_tokens}")
print(f"elapsed_time: {elapsed_time}")
tokens_per_second = generated_tokens / elapsed_time if elapsed_time > 0 else 0
print(f"トークン生成速度: {tokens_per_second:.2f} トークン/秒")
return tokens_per_second
tokens_per_second = trans("Translate English to Japanese.\nWhen translating, please use the following hints:\n[writing_style: journalistic]",
"""Tech war: China narrows AI gap with US despite chip restrictions
China is narrowing the artificial intelligence (AI) gap with the US through rapid progress in deploying applications and state-backed adoption of the technology, despite the lack of access to advanced chips, according to industry experts and analysts.
""")
```
## 留意事項 Attention
このアダプターをモデルとマージして保存すると性能が下がってしまう不具合が存在するため、**ベースモデル(unsloth/gemma-2-9b-it-bnb-4bit)とアダプターをマージして保存しないでください**
**Do not save this adapter merged with the base model(unsloth/gemma-2-9b-it-bnb-4bit)**, as there exists a bug that reduces performance when saving this adapter merged with the model.
どうしてもマージしたい場合は必ずPerplexityではなく、翻訳ベンチマークで性能を確認してから使うようにしてください
If you must merge, be sure to use a translation benchmark to check performance, not Perplexity!
### 利用規約 Terms of Use
本アダプターはApache License 2.0です。 gemma2と一緒に使用する場合は[Gemma License](https://ai.google.dev/gemma/terms)と[prohibited_use_policy](https://ai.google.dev/gemma/prohibited_use_policy)を考慮する必要があります。
This adapter is licensed under Apache License 2.0. If you use it with gemma2, you must consider the [Gemma License](https://ai.google.dev/gemma/terms) and [prohibited_use_policy](https://ai.google.dev/gemma/prohibited_use_policy).
加えて貴方に以下のお願いがあります。
Additionally, We have the following request to you.
私たちの以前のモデルであるALMA-7B-Ja-V2のダウンロード件数は15万件を超えているのですが、どんな人がどのような場面で使っているのか全く把握できていません。
Our previous model, ALMA-7B-Ja-V2, has over 150K downloads, but we have no idea who is using it and in what situations.
そのため、使用した後は[Googleフォームに感想や今後期待する方向性、気が付いた誤訳の例、参考にして欲しいデータの場所、Webサイトなどを是非とも記入](https://forms.gle/Ycr9nWumvGamiNma9)してください。
So, after you use it, please [fill out the Google form below with your impressions, future directions you expect us to take, examples of mistranslations you have noticed, and locations of data you would like us to reference, websites, etc.](https://forms.gle/Ycr9nWumvGamiNma9) by all means.
個人情報やメールアドレスは収集しないので、気軽にご記入をお願いします
We do not collect personal information or email address, so please feel free to fill out the form!
どんなご意見でも感謝します!
Any feedback would be appreciated!
### 謝辞 Acknowledgment
Original Base Model
google/gemma-2-9b-it
https://huggingface.co/google/gemma-2-9b-it
Base Model
unsloth/gemma-2-9b-it-bnb-4bit
https://huggingface.co/unsloth/gemma-2-9b-it-bnb-4bit
QLoRA Adapter
webbigdata/C3TR-Adapter
https://huggingface.co/webbigdata/C3TR-Adapter
This adapter was trained with Unsloth.
https://github.com/unslothai/unsloth
その他、[ALMA](https://arxiv.org/abs/2309.11674)をはじめ、コミュニティの皆さんからヒントを貰っています。ありがとう
Other tips I have received from [ALMA](https://arxiv.org/abs/2309.11674) and others in the community. Thank you.
- **Developed by:** [webbigdata](https://webbigdata.jp/) |