🧠 Model Card: Sam‑2.0

📌 Model Overview

Sam‑2.0 is a minimal, modular, decoder‑only Transformer architecture designed for chat‑style reasoning tasks.
It emphasizes reproducibility, ablation‑friendly design, and clean benchmarking across input modalities.

  • Architecture: Decoder‑only Transformer with RMSNorm, SwiGLU feed‑forward, and causal masking
  • Training Objective: Causal language modeling (CLM) with role‑based label masking
  • Checkpoint: sam2-epoch35.safetensors
  • Final Train Loss: 1.04
  • Validation Loss: Not tracked in this run
  • Training Duration: ~6272 s over 35 epochs
  • Framework: PyTorch + Hugging Face Transformers (custom model class)

🧱 Model Architecture

Component Description
Backbone Decoder‑only Transformer stack
Normalization RMSNorm
Attention Multi‑head self‑attention (causal)
Feed‑Forward SwiGLU activation with dropout
Positional Bias Learned absolute positions (no RoPE in this minimal variant)
Head Tied‑embedding LM head
Checkpoint Format safetensors with metadata for reproducibility

🧪 Training Details

  • Dataset: pfb30/multi_woz_v22
  • Batch Size: 8
  • Optimizer: AdamW
  • Learning Rate: 2 × 10⁻⁴ (constant in this run)
  • Loss Function: Cross‑entropy over assistant tokens only
  • Hardware: Kaggle GPU runtime
  • Logging: Step‑wise loss tracking, no validation during training

📊 Evaluation

Metric Value Notes
Final Train Loss 1.04 Achieved at Epoch 35/35
Validation Loss Not tracked in this run
Inference Speed Fast Lightweight architecture
Generalisation TBD To be compared against Sam‑2.5

🔧 Intended Use

  • Research: Benchmarking modular architectures and ablation studies
  • Education: Reasoning scaffolds and logic quizzes
  • Deployment: Lightweight agents for chat and dialogue modeling

🚫 Limitations

  • No validation tracking — generalisation must be inferred via external harnesses
  • Trained on MultiWOZ v2.2 only — may not generalize to other domains without fine‑tuning
  • Minimal architecture — no RoPE/MQA in this variant

📁 Files

  • sam2-epoch35.safetensors — final checkpoint
  • config.json — architecture and training config
  • tokenizer.json — tokenizer with special tokens
  • README.md — training logs and setup instructions

🧩 How to Load

from transformers import AutoTokenizer
import torch
from sam2 import Sam2, Sam2Config  # your custom model class

tok = AutoTokenizer.from_pretrained("Smilyai-labs/Sam-2.0")
cfg = Sam2Config(**json.load(open("config.json")))
model = Sam2(cfg)
state = torch.load("sam2-epoch35.safetensors", map_location="cpu")
model.load_state_dict(state)
model.eval()

prompt = "<|user|> Hello! <|eot|>\n<|assistant|>"
ids = tok.encode(prompt, return_tensors="pt")
with torch.no_grad():
    for _ in range(50):
        logits = model(ids)
        next_id = torch.argmax(logits[:, -1, :], dim=-1, keepdim=True)
        ids = torch.cat([ids, next_id], dim=1)
        if next_id.item() == tok.eos_token_id:
            break

print(tok.decode(ids[0]))
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Smilyai-labs/Sam-2.0