--- datasets: - lmg-anon/VNTL-v3.1-1k language: - en - ja base_model: - Qwen/Qwen2.5-7B pipeline_tag: translation tags: - translation - vntl - qwen license: apache-2.0 library_name: transformers --- # Qwen2.5-7B-VNTL-JP-EN Qwen2.5-7B finetuned for Japanese to English translation. Trained on ~150k sentences from [VNTL-v3.1-1k](https://huggingface.co/datasets/lmg-anon/VNTL-v3.1-1k). The model was trained on just the sentences in random order to make it more flexible and useful outside of just VN translation. ## Usage ### Ollama 1. `ollama run technobyte/Qwen2.5-7B-VNTL-JP-EN:q4_k_m` 2. Input just the Japanese sentence. ### Llama.cpp 1. Download the [GGUF](https://huggingface.co/TechnoByte/Qwen2.5-7B-VNTL-JP-EN-GGUF/tree/main). 2. `llama-cli -m Qwen2.5-7B-VNTL-JP-EN-Q4_K_M.gguf -no-cnv -p "A Japanese sentence along with a proper English equivalent.\nJapanese: 放課後はマンガ喫茶でまったり〜♡ おすすめのマンガ教えて! \nEnglish: "` ### Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "TechnoByte/Qwen2.5-7B-VNTL-JP-EN" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) messages = [ {"role": "user", "content": "放課後はマンガ喫茶でまったり〜♡ おすすめのマンガ教えて!"} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ## Prompt template ### Plaintext ``` A Japanese sentence along with a proper English equivalent. Japanese: JAPANESE SENTENCE HERE English: ``` ### Jinja (HF Transformers) ```jinja {% for i in range(0, messages|length, 2) %}A Japanese sentence along with a proper English equivalent. Japanese: {{ messages[i].content }} English:{% if i+1 < messages|length %} {{ messages[i+1].content }}<|endoftext|>{{ " " }}{% else %}{% endif %}{% endfor %} ``` ### Go (Ollama) ``` A Japanese sentence along with a proper English equivalent. Japanese: {{ .Prompt }} English: {{ .Response }}<|endoftext|> ``` ## Limitations - Can only translate one sentence per turn. - Can use incorrect pronouns due to lack of context.