BloomVN-0.5B-ppo / README.md
MRAGU's picture
Update README.md
4a8b5a5 verified
metadata
license: mit
language:
  - vi
datasets:
  - 5CD-AI/Vietnamese-cosmos-qa-gg-translated
base_model:
  - Qwen/Qwen2.5-0.5B
library_name: transformers
tags:
  - text-generation-inference
Logo

🌟 BloomVN-0.5B-ppo

A fine-tuned multilingual model for Vietnamese language

πŸ“‹ Overview

This model serves as a small-scale experiment (0.5B parameters) testing the Reinforcement Learning capabilities of veRL framework. The implementation uses PPO (Proximal Policy Optimization) method on a limited training dataset to evaluate veRL's performance and training behavior.

πŸ”§ Method

The experimentation process was conducted using veRL, focusing on:

  • Implementation of PPO algorithm with a 0.5B parameter model
  • Running training experiments on a small dataset
  • Testing veRL's framework capabilities in handling RL tasks
  • Evaluating training efficiency and model behavior

This lightweight approach allowed us to assess veRL's performance in a controlled, small-scale environment.

πŸ“Š VLMU Benchmark

EVALUATION DATE STEM πŸ”¬ SOCIAL SCIENCE 🌍 HUMANITIES πŸ“š OTHERS 🎯 AVG ⭐
07/02/2025 23.18 32.84 32.71 33.67 29.43

🀝 Contributors

Developed with ❀️ by BlossomAI


Star ⭐️ this repo if you find it valuable!