RSCaLM-138M-LLaMA

RSCaLM (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for 20,000 steps. This run was conducted purely for experimental and benchmarking purposes — no high expectations for downstream task quality.

📌 Experiment Summary

Architecture: LLaMA-style causal decoder
- Rotary positional embeddings (RoPE)
- Pre-normalization with RMSNorm
- SwiGLU feed-forward layers
- Multi-head self-attention with key-value caching support
Parameter Count: ~138M
Context Length: 2048 tokens
Tokenizer: LLaMA tokenizer
Training Framework: PyTorch + Hugging Face Transformers
Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
Scheduler: Cosine decay with warmup
Precision: Mixed-precision (FP16/BF16)
Batching: Gradient accumulation to simulate large batch size
Dataset: General text corpus for pipeline validation (not domain-specific)
Steps Completed: 20,000 (~32% of planned total)

📉 Validation Loss Progress

Step	Val Loss
1000	5.5968
2000	4.8513
5000	4.2105
10000	3.9603
15000	3.8497
20000	3.7891

Loss shows steady improvement over the limited training period.

⚠️ Notes

This is an early prototype — not tuned for production use.
Training stopped after ~32% of planned total steps.
Possible repetition loops observed in generation — expected for low-step runs.
Intended for research reference, not for deployment in critical tasks.

🔧 Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "The sun is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🔧 Example Usage (with repetition control)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "when a man goes to fishing"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generation settings to reduce repetition
outputs = model.generate(
    **inputs,
    max_new_tokens=100,        # Limit length of output
    temperature=0.7,           # Lower temperature = more focused
    top_p=0.9,                  # Nucleus sampling
    top_k=50,                   # Top-K filtering
    repetition_penalty=1.2,     # Penalize repeating tokens
    no_repeat_ngram_size=3,     # Prevent repeating trigrams
    eos_token_id=tokenizer.eos_token_id,  # End generation at EOS
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

💡 Tips for controlling repetition:

repetition_penalty – Increase slightly above 1.0 (e.g., 1.2–1.5) to discourage repeated phrases.
no_repeat_ngram_size – Set to 3 or 4 to avoid repeated n-grams.
top_k + top_p – Combine both for better randomness control.
Lower temperature – Keeps outputs focused and less chaotic.
Stop sequences – Add specific words/phrases to halt generation early if needed.

📜 License

apache-2.0

yasserrmd
/

RSCaLM-138M-LLaMA