--- license: mit language: - vi datasets: - 5CD-AI/Vietnamese-cosmos-qa-gg-translated base_model: - Qwen/Qwen2.5-0.5B library_name: transformers tags: - text-generation-inference ---

# 🌟 BloomVN-0.5B-ppo

### A fine-tuned multilingual model for Vietnamese language ## 📋 Overview This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior. ## 🔧 Method The experimentation process was conducted using [veRL](https://github.com/volcengine/verl), focusing on: - Implementation of PPO algorithm with a 0.5B parameter model - Running training experiments on a small dataset - Testing veRL's framework capabilities in handling RL tasks - Evaluating training efficiency and model behavior This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment. ## 📊 VLMU Benchmark | EVALUATION DATE | STEM 🔬 | SOCIAL SCIENCE 🌍 | HUMANITIES 📚 | OTHERS 🎯 | AVG ⭐ | |----------------|--------|------------------|---------------|-----------|--------| | 07/02/2025 | 23.18 | 32.84 | 32.71 | 33.67 | 29.43 | ## 🤝 Contributors Developed with ❤️ by [BlossomAI](https://github.com/BlossomAI) ---

_{Star ⭐️ this repo if you find it valuable!}