SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
β’
2502.02737
β’
Published
β’
110
nlp, fewshot learning, sentence transformers
export_static_quantized_openvino_model
method to quantize a model.prompts
argument in SentenceTransformerTrainingArguments
. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")
. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later πfrom_model2vec
or with from_distillation
where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.mine_hard_negatives
docs: https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives