ahmedheakl/arazn-llama3-english

How to use

Just install peft, transformers, 'accelerate', 'bitsandbytes' and pytorch first.

pip install peft accelerate bitsandbytes transformers torch

Then login with your huggingface token to get access to base models

huggingface-cli login --token <YOUR_HF_TOKEN>

Then load the model.

from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

peft_model_id = "ahmedheakl/arazn-llama3-english"
peft_config = PeftConfig.from_pretrained(peft_model_id)
base_model_name = peft_config.base_model_name_or_path
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map="auto", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base_model, peft_model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

Then do inference

import torch

raw_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Translate the following code-switched Arabic-English-mixed text to English only.<|eot_id|><|start_header_id|>user<|end_header_id|>

{source}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""
def inference(prompt) -> str:
    prompt = raw_prompt.format(source=prompt)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    generated_ids = model.generate(
        **inputs,
        use_cache=True,
        num_return_sequences=1,
        max_new_tokens=100,
        # do_sample=True,
        num_beams=1,
      #  temperature=0.7,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
    )
    outputs = tokenizer.batch_decode(generated_ids)[0]
    torch.cuda.empty_cache()
    torch.cuda.synchronize()
    return outputs.split("assistant<|end_header_id|>\n\n")[-1].split("<|eot_id|>")[0]
print(inference("أنا أحب الbanana")) # I love bananas

Please see paper & code for more information:

Citation

BibTeX:

@article{heakl2024arzen,
  title={ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs},
  author={Heakl, Ahmed and Zaghloul, Youssef and Ali, Mennatullah and Hossam, Rania and Gomaa, Walid},
  journal={arXiv preprint arXiv:2406.18120},
  year={2024}
}

Model Card Authors

Email: [email protected]
Linkedin: https://linkedin.com/in/ahmed-heakl

ahmedheakl
/

arazn-llama3-english

How to use

Citation

Model Card Authors

Collection including ahmedheakl/arazn-llama3-english

ArzEn-LLM