toddric-3b-merged-v3

Type: Qwen2.5-3B-Instruct, LoRA merged (bf16)
Use: personal “Toddric” assistant with Todd’s tone and workflows

TL;DR

Base: Qwen/Qwen2.5-3B-Instruct
Fine-tune: LoRA (r=32, alpha=64, dropout=0.05), seq=2048, epochs=2, grad-accum=16
Merge: weights merged into a single checkpoint for easy deployment
Recommended: attn_implementation="eager", left padding, use_cache=True
VRAM: comfortable on 8–12 GB GPUs (bf16)

Intended use

Persona-consistent writing (tweets, bios, emails, release notes)
Small engineering helpers (git/Docker/systemd snippets, “one fenced block” patterns)
Summaries, plans, QA checklists, editing passes in a specific voice

Not intended for: factual retrieval from private sources, medical/legal advice, or impersonating private individuals.

How to load

from transformers import AutoTokenizer, AutoModelForCausalLM

mid = "toddie314/toddric-3b-merged-v3"
tok = AutoTokenizer.from_pretrained(mid, use_fast=True)
tok.padding_side = "left"
tok.pad_token = tok.pad_token or tok.eos_token  # fallback if pad is unset

model = AutoModelForCausalLM.from_pretrained(
    mid,
    device_map={"": 0},
    torch_dtype="bfloat16",
    attn_implementation="eager",
    low_cpu_mem_usage=True,
)
model.config.use_cache = True
model.generation_config.pad_token_id = tok.pad_token_id

msgs = [
  {"role":"system","content":"You are a helpful assistant."},
  {"role":"user","content":"Give me two tweets announcing Toddric’s alpha; two lines only, no preamble."}
]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)

ids = tok([prompt], return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(**ids, max_new_tokens=200, do_sample=False, eos_token_id=tok.eos_token_id)
print(tok.decode(out[0, ids['input_ids'].shape[-1]:], skip_special_tokens=True))

Evaluation (acceptance pack)

We use a 57-prompt persona acceptance suite (tweets, CIFS snippet, Mastodon, bios, release notes, 90-word micro-story, etc.) that checks both content constraints and latency/tok/s.

Result (bf16, RTX 4060 8 GB): 100% pass with attn="eager".
Throughput varies by stack; typical local runs show ~15–19 tok/s on this card.

Run your own gate if speed matters on your hardware.

Training recipe (for reproducibility)

Base: Qwen/Qwen2.5-3B-Instruct
LoRA: r=32, alpha=64, dropout=0.05
Seq len: 2048
Optim: standard SFT (supervised) on curated data
Data sources (curated & de-identified):
- Mail (Sent) → compose/reply patterns
- ChatGPT logs (neutral mode) → project facts & tone
- Memories → structured personal facts with stubs/fills
- Fiction (EPUB/GDoc) → tone/voice micro-tasks (short spans, not full books)

Notes

Left padding matters (decoder-only).
Greedy defaults are shipped via generation_config.json; set do_sample=True if you want creativity.
Safety: the model refuses some unsafe requests and offers alternatives; always keep a human in the loop.

toddie314
/

toddric-3b-merged-v3