Sasak-translite-v1

Sanskrit AI Poster

Sasak Model License

A specialized Sasak language model for translation and transliteration tasks

🌟 Model Description

  • Developed by: Tanwir
  • Language : Indonesia dan Sasak

Bahasa Sasak adalah bahasa daerah yang digunakan oleh suku Sasak di Pulau Lombok, Nusa Tenggara Barat. Bahasa ini memiliki beberapa dialek utama seperti Ngeno-Ngene, Meno-Mene, dan Ngeto-Ngete yang menunjukkan keragaman budaya dan geografis penuturnya. Struktur bahasa Sasak dipengaruhi oleh bahasa Bali dan Melayu, namun memiliki kosakata, pelafalan, dan tata bahasa yang khas. Bahasa ini digunakan dalam komunikasi sehari-hari, upacara adat, serta karya sastra lisan seperti tembang dan pepaosan, sehingga menjadi bagian penting dari identitas budaya masyarakat Lombok.

  1. Sasak to Indonesia Transliteration - Translating Sasak text to Indonesia
  2. Indonesia to Sasak Translation - Translating Indonesia text to Sasak

📊 Training

download

📊 Model Specifications

Parameter Value
Base Model Qwen/Qwen2.5-7B-Instruct
Fine-tuning Method LoRA (Low-Rank Adaptation)
LoRA Rank 8
LoRA Alpha 16
**LoRA+ LR ratio 8
Sequence Length 512 tokens
Training Epochs 3
Learning Rate 5e-5
Batch Size 2 (micro) × 4 (gradient accumulation)
Optimizer AdamW 8-bit
Precision bfloat16

🛠️ Usage Examples

1. Indonesia to Sasak Transliteration

Pastikan untuk memperbarui instalasi transformer Anda melalui pip install --upgrade transformer.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "diabolic6045/Sanskrit-qwen-7B-Translate-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare the conversation
messages = [
    {
        "content": "You are a Sasak language translation expert. Translate the given Indonesian text into the Sasak language.",
        "role": "system"
    },
    {
        "content": "Translate this Indonesian text to Sasak: aturan",
        "role": "user"
    }
]

# Apply chat template and generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

print(response)
# Output: buddhiścārthātparo lobhaḥ santoṣaḥ paramaṃ sukham |

2. Sasak to Indonesia Transliteration

Pastikan untuk memperbarui instalasi transformer Anda melalui pip install --upgrade transformer.

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "diabolic6045/Sanskrit-qwen-7B-Translate-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Prepare the conversation
messages = [
    {
        "content": "You are an Indonesian translation expert. Translate the given Sasak text into Indonesian language.",
        "role": "system"
    },
    {
        "content": "Translate this Sasak text to Indonesian: titi tate",
        "role": "user"
    }
]

# Apply chat template and generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)

print(response)
# Output: buddhiścārthātparo lobhaḥ santoṣaḥ paramaṃ sukham |

Built with ❤️ for Sasak language preservation and education

Built with Kodetr

Downloads last month
36
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kodetr/sasak-translite-v1

Base model

Qwen/Qwen2.5-7B
Finetuned
(2181)
this model