RSCaLM-138M-LLaMA
RSCaLM (Research Scale Causal Language Model) is an experimental 138M-parameter LLaMA-architecture model trained for 20,000 steps. This run was conducted purely for experimental and benchmarking purposes β no high expectations for downstream task quality.
π Experiment Summary
Architecture: LLaMA-style causal decoder
- Rotary positional embeddings (RoPE)
- Pre-normalization with RMSNorm
- SwiGLU feed-forward layers
- Multi-head self-attention with key-value caching support
Parameter Count: ~138M
Context Length: 2048 tokens
Tokenizer: LLaMA tokenizer
Training Framework: PyTorch + Hugging Face Transformers
Optimizer: AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
Scheduler: Cosine decay with warmup
Precision: Mixed-precision (FP16/BF16)
Batching: Gradient accumulation to simulate large batch size
Dataset: General text corpus for pipeline validation (not domain-specific)
Steps Completed: 20,000 (~32% of planned total)
π Validation Loss Progress
Step | Val Loss |
---|---|
1000 | 5.5968 |
2000 | 4.8513 |
5000 | 4.2105 |
10000 | 3.9603 |
15000 | 3.8497 |
20000 | 3.7891 |
Loss shows steady improvement over the limited training period.
β οΈ Notes
- This is an early prototype β not tuned for production use.
- Training stopped after ~32% of planned total steps.
- Possible repetition loops observed in generation β expected for low-step runs.
- Intended for research reference, not for deployment in critical tasks.
π§ Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "The sun is"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π§ Example Usage (with repetition control)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "yasserrmd/RSCaLM-138M-LLaMA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
prompt = "when a man goes to fishing"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generation settings to reduce repetition
outputs = model.generate(
**inputs,
max_new_tokens=100, # Limit length of output
temperature=0.7, # Lower temperature = more focused
top_p=0.9, # Nucleus sampling
top_k=50, # Top-K filtering
repetition_penalty=1.2, # Penalize repeating tokens
no_repeat_ngram_size=3, # Prevent repeating trigrams
eos_token_id=tokenizer.eos_token_id, # End generation at EOS
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π‘ Tips for controlling repetition:
repetition_penalty
β Increase slightly above1.0
(e.g.,1.2β1.5
) to discourage repeated phrases.no_repeat_ngram_size
β Set to3
or4
to avoid repeated n-grams.top_k
+top_p
β Combine both for better randomness control.- Lower
temperature
β Keeps outputs focused and less chaotic. - Stop sequences β Add specific words/phrases to halt generation early if needed.
π License
apache-2.0
- Downloads last month
- 16