File size: 2,605 Bytes
58b97e7 06e1577 58b97e7 06e1577 58b97e7 abdee89 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 58b97e7 06e1577 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
base_model: meta-llama/Meta-Llama-3-8B-Instruct
library_name: peft
datasets:
- lordjia/Cantonese_English_Translation
---
# MISHANM/Cantonese_eng_text_generation_Llama3_8B_instruction
This model has undergone meticulous fine-tuning for Cantonese language compatibility. It is equipped to handle question-answering and translation tasks between English and Cantonese. Leveraging sophisticated natural language processing methodologies, it delivers precise and context-sensitive responses, ensuring a comprehensive grasp of Cantonese nuances. Consequently, its outputs are dependable and pertinent across various scenarios.
## Model Details
1. Language: Cantonese
2. Tasks: Question Answering(Cantonese to Cantonese) , Translation (Tibetan to Cantonese)
3. Base Model: meta-llama/Meta-Llama-3-8B-Instruct
# Training Details
The model is trained on approx 109,942 instruction samples.
1. GPUs: 4*AMD Radeon™ PRO V620
2. Training Time: 61:07:36
## Inference with HuggingFace
```python3
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the fine-tuned model and tokenizer
model_path = "MISHANM/Cantonese_eng_text_generation_Llama3_8B_instruction"
model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)
# Function to generate text
def generate_text(prompt, max_length=500, temperature=0.9):
# Format the prompt according to the chat template
messages = [
{
"role": "system",
"content": "You are a Cantonese language expert and linguist, with same knowledge give response in Cantonese language.",
},
{"role": "user", "content": prompt}
]
# Apply the chat template
formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"
# Tokenize and generate output
inputs = tokenizer(formatted_prompt, return_tensors="pt")
output = model.generate(
**inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
)
return tokenizer.decode(output[0], skip_special_tokens=True)
# Example usage
prompt = """佢日日搭的士出入,好似幾百萬未開頭噉。"""
translated_text = generate_text(prompt)
print(translated_text)
```
## Citation Information
```
@misc{MISHANM/Cantonese_eng_text_generation_Llama3_8B_instruction,
author = {Mishan Maurya},
title = {Introducing Fine Tuned LLM for Cantonese Language},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face repository},
}
```
- PEFT 0.12.0 |