toddric-3b-merged-v3

Type: Qwen2.5-3B-Instruct, LoRA merged (bf16)
Use: personal โ€œToddricโ€ assistant with Toddโ€™s tone and workflows


TL;DR

  • Base: Qwen/Qwen2.5-3B-Instruct
  • Fine-tune: LoRA (r=32, alpha=64, dropout=0.05), seq=2048, epochs=2, grad-accum=16
  • Merge: weights merged into a single checkpoint for easy deployment
  • Recommended: attn_implementation="eager", left padding, use_cache=True
  • VRAM: comfortable on 8โ€“12 GB GPUs (bf16)

Intended use

  • Persona-consistent writing (tweets, bios, emails, release notes)
  • Small engineering helpers (git/Docker/systemd snippets, โ€œone fenced blockโ€ patterns)
  • Summaries, plans, QA checklists, editing passes in a specific voice

Not intended for: factual retrieval from private sources, medical/legal advice, or impersonating private individuals.


How to load

from transformers import AutoTokenizer, AutoModelForCausalLM

mid = "toddie314/toddric-3b-merged-v3"
tok = AutoTokenizer.from_pretrained(mid, use_fast=True)
tok.padding_side = "left"
tok.pad_token = tok.pad_token or tok.eos_token  # fallback if pad is unset

model = AutoModelForCausalLM.from_pretrained(
    mid,
    device_map={"": 0},
    torch_dtype="bfloat16",
    attn_implementation="eager",
    low_cpu_mem_usage=True,
)
model.config.use_cache = True
model.generation_config.pad_token_id = tok.pad_token_id

msgs = [
  {"role":"system","content":"You are a helpful assistant."},
  {"role":"user","content":"Give me two tweets announcing Toddricโ€™s alpha; two lines only, no preamble."}
]
prompt = tok.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)

ids = tok([prompt], return_tensors="pt").to(model.device)
with torch.inference_mode():
    out = model.generate(**ids, max_new_tokens=200, do_sample=False, eos_token_id=tok.eos_token_id)
print(tok.decode(out[0, ids['input_ids'].shape[-1]:], skip_special_tokens=True))

Evaluation (acceptance pack)

We use a 57-prompt persona acceptance suite (tweets, CIFS snippet, Mastodon, bios, release notes, 90-word micro-story, etc.) that checks both content constraints and latency/tok/s.

  • Result (bf16, RTX 4060 8 GB): 100% pass with attn="eager".
    Throughput varies by stack; typical local runs show ~15โ€“19 tok/s on this card.

Run your own gate if speed matters on your hardware.


Training recipe (for reproducibility)

  • Base: Qwen/Qwen2.5-3B-Instruct
  • LoRA: r=32, alpha=64, dropout=0.05
  • Seq len: 2048
  • Optim: standard SFT (supervised) on curated data
  • Data sources (curated & de-identified):
    • Mail (Sent) โ†’ compose/reply patterns
    • ChatGPT logs (neutral mode) โ†’ project facts & tone
    • Memories โ†’ structured personal facts with stubs/fills
    • Fiction (EPUB/GDoc) โ†’ tone/voice micro-tasks (short spans, not full books)

Notes

  • Left padding matters (decoder-only).
  • Greedy defaults are shipped via generation_config.json; set do_sample=True if you want creativity.
  • Safety: the model refuses some unsafe requests and offers alternatives; always keep a human in the loop.
Downloads last month
5
Safetensors
Model size
3.09B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for toddie314/toddric-3b-merged-v3

Base model

Qwen/Qwen2.5-3B
Finetuned
(737)
this model

Evaluation results