toddric-3b-merged-v3
Type: Qwen2.5-3B-Instruct, LoRA merged (bf16)
Use: personal โToddricโ assistant with Toddโs tone and workflows
TL;DR
- Base:
Qwen/Qwen2.5-3B-Instruct
- Fine-tune: LoRA (r=32, alpha=64, dropout=0.05), seq=2048, epochs=2, grad-accum=16
- Merge: weights merged into a single checkpoint for easy deployment
- Recommended:
attn_implementation="eager"
, left padding,use_cache=True
- VRAM: comfortable on 8โ12 GB GPUs (bf16)
Intended use
- Persona-consistent writing (tweets, bios, emails, release notes)
- Small engineering helpers (git/Docker/systemd snippets, โone fenced blockโ patterns)
- Summaries, plans, QA checklists, editing passes in a specific voice
Not intended for: factual retrieval from private sources, medical/legal advice, or impersonating private individuals.
How to load
from transformers import AutoTokenizer, AutoModelForCausalLM
mid = "toddie314/toddric-3b-merged-v3"
tok = AutoTokenizer.from_pretrained(mid, use_fast=True)
tok.padding_side = "left"
tok.pad_token = tok.pad_token or tok.eos_token # fallback if pad is unset
model = AutoModelForCausalLM.from_pretrained(
mid,
device_map={"": 0},
torch_dtype="bfloat16",
attn_implementation="eager",
low_cpu_mem_usage=True,
)
model.config.use_cache = True
model.generation_config.pad_token_id = tok.pad_token_id
msgs = [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Give me two tweets announcing Toddricโs alpha; two lines only, no preamble."}
]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
ids = tok([prompt], return_tensors="pt").to(model.device)
with torch.inference_mode():
out = model.generate(**ids, max_new_tokens=200, do_sample=False, eos_token_id=tok.eos_token_id)
print(tok.decode(out[0, ids['input_ids'].shape[-1]:], skip_special_tokens=True))
Evaluation (acceptance pack)
We use a 57-prompt persona acceptance suite (tweets, CIFS snippet, Mastodon, bios, release notes, 90-word micro-story, etc.) that checks both content constraints and latency/tok/s.
- Result (bf16, RTX 4060 8 GB): 100% pass with
attn="eager"
.
Throughput varies by stack; typical local runs show ~15โ19 tok/s on this card.
Run your own gate if speed matters on your hardware.
Training recipe (for reproducibility)
- Base:
Qwen/Qwen2.5-3B-Instruct
- LoRA: r=32, alpha=64, dropout=0.05
- Seq len: 2048
- Optim: standard SFT (supervised) on curated data
- Data sources (curated & de-identified):
- Mail (Sent) โ compose/reply patterns
- ChatGPT logs (neutral mode) โ project facts & tone
- Memories โ structured personal facts with stubs/fills
- Fiction (EPUB/GDoc) โ tone/voice micro-tasks (short spans, not full books)
Notes
- Left padding matters (decoder-only).
- Greedy defaults are shipped via
generation_config.json
; setdo_sample=True
if you want creativity. - Safety: the model refuses some unsafe requests and offers alternatives; always keep a human in the loop.
- Downloads last month
- 5
Model tree for toddie314/toddric-3b-merged-v3
Evaluation results
- acceptance-pass-rate on persona_acceptance_suiteself-reported1.000
- tokens-per-second on persona_acceptance_suiteself-reported17.000