Model Card for GRPO Enhanced SmolLM-135M-Instruct
This model extends the capabilities of the SmolLM-135M-Instruct
model with specific enhancements for task-oriented language generation, emphasizing concise and formatted text generation using generative reinforcement learning.
Model Details
Model Description
This model, developed using the Transformers library, is fine-tuned from the HuggingFaceTB/SmolLM-135M-Instruct
base model on the mlabonne/smoltldr
dataset. It incorporates enhancements via a reward-based training system to improve the specificity and formatting of generated text, ideal for applications requiring structured output.
- Model type: GRPOTrainer with TRL
- Language(s) (NLP): English
- License: MIT
- Finetuned from model:
HuggingFaceTB/SmolLM-135M-Instruct
Model Sources
- Repository: https://huggingface.co/eagle0504/finetune-mlabonne-smoltldr-using-HuggingFaceTB-SmolLM-135M-Instruct
Uses
Direct Use
This model is ready to generate structured text outputs directly for applications such as automated reasoning or instructional content generation without the need for additional fine-tuning.
Bias, Risks, and Limitations
The model may inherently carry biases from the training data and the base model it was fine-tuned from. It might not handle out-of-distribution inputs well or generate less accurate outputs in languages other than English.
Recommendations
Continual monitoring and updating of the model with diverse datasets can help mitigate biases. Users should also validate model outputs before use in critical applications.
How to Get Started with the Model
To start using this model, integrate it within your Transformers-based pipeline, specifying its model ID on the HuggingFace Hub. Ensure your environment supports bf16 for optimal performance.
Training Details
Training Data
The model was trained on the mlabonne/smoltldr
dataset, which consists of structured prompts and completions designed for text generation tasks.
Training Procedure
Preprocessing
Data preprocessing involved tokenization using the tokenizer from the base model, with text split into prompts and completions.
Training Hyperparameters
- Training regime: bf16 mixed precision
- Total training time: Approximately 494 seconds
- Steps per second: 0.101
- Batch size: 1 (with gradient accumulation over 2 steps)
- Training loss: 5.720701068639755e-05
Evaluation
Testing Data, Factors & Metrics
Evaluated on a held-out portion of the mlabonne/smoltldr
dataset.
Results
The final model achieved a very low average training loss of 5.720701068639755e-05, suggesting highly effective learning under the configured reward functions.
Environmental Impact
Training was conducted on GPU hardware compatible with bf16 precision, optimizing compute efficiency and reducing energy consumption.
Technical Specifications
Model Architecture and Objective
The model architecture is based on a causal language model with enhancements for token-level rewards to improve response quality in task-specific generations.
Compute Infrastructure
Training was performed on cloud-based GPUs with bf16 support to ensure fast computation and reduced environmental impact.
More Information
For more detailed documentation and usage examples, visit the repository link provided above.
- Downloads last month
- 49
Model tree for eagle0504/finetune-mlabonne-smoltldr-using-HuggingFaceTB-SmolLM-135M-Instruct
Base model
HuggingFaceTB/SmolLM-135M