RSCaLM-138M-LLaMA

RSCaLM (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for 20,000 steps. This run was conducted purely for experimental and benchmarking purposes β€” no high expectations for downstream task quality.


πŸ“Œ Experiment Summary

  • Architecture: LLaMA-style causal decoder

    • Rotary positional embeddings (RoPE)
    • Pre-normalization with RMSNorm
    • SwiGLU feed-forward layers
    • Multi-head self-attention with key-value caching support
  • Parameter Count: ~138M

  • Context Length: 2048 tokens

  • Tokenizer: LLaMA tokenizer

  • Training Framework: PyTorch + Hugging Face Transformers

  • Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)

  • Scheduler: Cosine decay with warmup

  • Precision: Mixed-precision (FP16/BF16)

  • Batching: Gradient accumulation to simulate large batch size

  • Dataset: General text corpus for pipeline validation (not domain-specific)

  • Steps Completed: 20,000 (~32% of planned total)


πŸ“‰ Validation Loss Progress

Step Val Loss
1000 5.5968
2000 4.8513
5000 4.2105
10000 3.9603
15000 3.8497
20000 3.7891

Loss shows steady improvement over the limited training period.


⚠️ Notes

  • This is an early prototype β€” not tuned for production use.
  • Training stopped after ~32% of planned total steps.
  • Possible repetition loops observed in generation β€” expected for low-step runs.
  • Intended for research reference, not for deployment in critical tasks.

πŸ”§ Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "The sun is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ”§ Example Usage (with repetition control)

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

prompt = "when a man goes to fishing"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generation settings to reduce repetition
outputs = model.generate(
    **inputs,
    max_new_tokens=100,        # Limit length of output
    temperature=0.7,           # Lower temperature = more focused
    top_p=0.9,                  # Nucleus sampling
    top_k=50,                   # Top-K filtering
    repetition_penalty=1.2,     # Penalize repeating tokens
    no_repeat_ngram_size=3,     # Prevent repeating trigrams
    eos_token_id=tokenizer.eos_token_id,  # End generation at EOS
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ’‘ Tips for controlling repetition:

  1. repetition_penalty – Increase slightly above 1.0 (e.g., 1.2–1.5) to discourage repeated phrases.
  2. no_repeat_ngram_size – Set to 3 or 4 to avoid repeated n-grams.
  3. top_k + top_p – Combine both for better randomness control.
  4. Lower temperature – Keeps outputs focused and less chaotic.
  5. Stop sequences – Add specific words/phrases to halt generation early if needed.

πŸ“œ License

apache-2.0

Downloads last month
16
Safetensors
Model size
138M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train yasserrmd/RSCaLM-138M-LLaMA