---
license: mit
language:
- vi
datasets:
- 5CD-AI/Vietnamese-cosmos-qa-gg-translated
base_model:
- Qwen/Qwen2.5-0.5B
library_name: transformers
tags:
- text-generation-inference
---
# 🌟 BloomVN-0.5B-ppo
### A fine-tuned multilingual model for Vietnamese language
## 📋 Overview
This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.
## 🔧 Method
The experimentation process was conducted using [veRL](https://github.com/volcengine/verl), focusing on:
- Implementation of PPO algorithm with a 0.5B parameter model
- Running training experiments on a small dataset
- Testing veRL's framework capabilities in handling RL tasks
- Evaluating training efficiency and model behavior
This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.
## 📊 VLMU Benchmark
| EVALUATION DATE | STEM 🔬 | SOCIAL SCIENCE 🌍 | HUMANITIES 📚 | OTHERS 🎯 | AVG ⭐ |
|----------------|--------|------------------|---------------|-----------|--------|
| 07/02/2025 | 23.18 | 32.84 | 32.71 | 33.67 | 29.43 |
## 🤝 Contributors
Developed with ❤️ by [BlossomAI](https://github.com/BlossomAI)
---
Star ⭐️ this repo if you find it valuable!