metadata

license: mit
language:
  - vi
datasets:
  - 5CD-AI/Vietnamese-cosmos-qa-gg-translated
base_model:
  - Qwen/Qwen2.5-0.5B
library_name: transformers
tags:
  - text-generation-inference

🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

📋 Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

🔧 Method

The experimentation process was conducted using veRL, focusing on:

Implementation of PPO algorithm with a 0.5B parameter model
Running training experiments on a small dataset
Testing veRL's framework capabilities in handling RL tasks
Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

📊 VLMU Benchmark

EVALUATION DATE	STEM 🔬	SOCIAL SCIENCE 🌍	HUMANITIES 📚	OTHERS 🎯	AVG ⭐
07/02/2025	23.18	32.84	32.71	33.67	29.43

🤝 Contributors

Developed with ❤️ by BlossomAI

_{Star ⭐️ this repo if you find it valuable!}