neural-mesh-v2 / Update /Phase2_Benchmark_System.md
hjkim00's picture
Restore all essential files - code, configs, and MBPP/HumanEval data
24c2665 verified
# Phase 2: ๋ฒค์น˜๋งˆํฌ ๋ฌธ์ œ ํ’€์ด ์‹œ์Šคํ…œ ์™„๋ฃŒ
## โœ… ๊ตฌํ˜„๋œ ์ปดํฌ๋„ŒํŠธ
### 1. BenchmarkProblemLoader
- **ํŒŒ์ผ**: `absolute_zero_reasoner/testtime/benchmark_loader.py`
- **๊ธฐ๋Šฅ**:
- HumanEval+, MBPP+ ๋ฌธ์ œ ๋กœ๋”ฉ
- ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค ์ถ”์ถœ (assert ๋ฌธ ํŒŒ์‹ฑ)
- ์†”๋ฃจ์…˜ ๊ฒ€์ฆ (๊ตฌ๋ฌธ + ์‹คํ–‰)
- ๋ฐฐ์น˜ ๋กœ๋”ฉ ๋ฐ ํ†ต๊ณ„ ์ •๋ณด ์ œ๊ณต
- **๊ธฐ๋ฐ˜**: ๊ธฐ์กด `load_humaneval_problem` ํ•จ์ˆ˜ ํ™•์žฅ
### 2. InitialSolutionGenerator
- **ํŒŒ์ผ**: `absolute_zero_reasoner/testtime/solution_generator.py`
- **๊ธฐ๋Šฅ**:
- AZR ์Šคํƒ€์ผ ๋ชจ๋ธ ๋กœ๋”ฉ (flash attention, gradient checkpointing)
- Greedy ์ƒ์„ฑ (AZR evaluation๊ณผ ๋™์ผ)
- ํ•จ์ˆ˜ ์ •์˜ ์ž๋™ ๋ณต๊ตฌ
- ๋Œ€์ฒด ์†”๋ฃจ์…˜ ์ƒ์„ฑ (๋ฌธ์ œ๋ณ„ ํ…œํ”Œ๋ฆฟ)
- **๊ธฐ๋ฐ˜**: ๊ธฐ์กด `generate_initial_solution` ํ•จ์ˆ˜ ํด๋ž˜์Šคํ™”
### 3. TestTimeLogger
- **ํŒŒ์ผ**: `absolute_zero_reasoner/testtime/logger.py`
- **๊ธฐ๋Šฅ**:
- ์š”๊ตฌ์‚ฌํ•ญ 1: ๋ฒค์น˜๋งˆํฌ ๋ฌธ์ œ + LLM ๋‹ต๋ณ€ + ์ •๋‹ต ์—ฌ๋ถ€
- ์š”๊ตฌ์‚ฌํ•ญ 2: IPO ์ถ”์ถœ + ํƒœ์Šคํฌ ์ƒ์„ฑ ๋กœ๊ทธ
- ์š”๊ตฌ์‚ฌํ•ญ 3: ํƒœ์Šคํฌ ์ •ํ™•๋„ + reward ๋กœ๊ทธ
- ์š”๊ตฌ์‚ฌํ•ญ 4: VeRL ํ•™์Šต ์ง„ํ–‰ ๋กœ๊ทธ
- JSON ํ˜•ํƒœ ๊ตฌ์กฐํ™”๋œ ๋กœ๊ทธ ์ €์žฅ
### 4. ์„ค์ • ์‹œ์Šคํ…œ
- **ํŒŒ์ผ**: `absolute_zero_reasoner/testtime/config.py`
- **ํด๋ž˜์Šค**: `TestTimeConfig`, `BenchmarkConfig`
- **๊ธฐ๋Šฅ**: AZR ํ˜ธํ™˜ + TestTime ํŠนํ™” ์„ค์ •
## ๐Ÿงช ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ
### ๊ธฐ๋ณธ ๊ธฐ๋Šฅ ํ…Œ์ŠคํŠธ (โœ… 3/3 ํ†ต๊ณผ)
```
Configuration: โœ… PASS
Logger: โœ… PASS
BenchmarkLoader: โœ… PASS
```
### ๊ฒ€์ฆ๋œ ๊ธฐ๋Šฅ
- โœ… MBPP ๋ฌธ์ œ ๋กœ๋”ฉ (Mbpp/2 ์„ฑ๊ณต)
- โœ… ๋ฌธ์ œ ํ†ต๊ณ„ (378๊ฐœ ๋ฌธ์ œ ํ™•์ธ)
- โœ… ๋กœ๊น… ์‹œ์Šคํ…œ (5๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ)
- โœ… ์„ค์ • ๊ด€๋ฆฌ (AZR ํ˜ธํ™˜)
## ๐Ÿ“ ์ƒ์„ฑ๋œ ๊ตฌ์กฐ
```
TestTime-RLVR-v2/absolute_zero_reasoner/testtime/
โ”œโ”€โ”€ __init__.py # ํŒจํ‚ค์ง€ ์ดˆ๊ธฐํ™”
โ”œโ”€โ”€ config.py # ์„ค์ • ํด๋ž˜์Šค
โ”œโ”€โ”€ benchmark_loader.py # ๋ฒค์น˜๋งˆํฌ ๋กœ๋”
โ”œโ”€โ”€ solution_generator.py # ์†”๋ฃจ์…˜ ์ƒ์„ฑ๊ธฐ
โ””โ”€โ”€ logger.py # ๋กœ๊น… ์‹œ์Šคํ…œ
```
## ๐Ÿ—‘๏ธ ์ •๋ฆฌ๋œ ํ•ญ๋ชฉ
- โœ… Python ์บ์‹œ ํŒŒ์ผ (`__pycache__`, `*.pyc`) ์‚ญ์ œ
- โœ… ๋ถˆํ•„์š”ํ•œ ์ž„ํฌํŠธ ์ •๋ฆฌ (์•„์ง ๊ตฌํ˜„๋˜์ง€ ์•Š์€ ์ปดํฌ๋„ŒํŠธ ์ฃผ์„ ์ฒ˜๋ฆฌ)
- โœ… ํ…Œ์ŠคํŠธ ํŒŒ์ผ์„ `/tmp/azr/`์— ์ž„์‹œ ์ €์žฅ
## ๐ŸŽฏ ๋‹ค์Œ ๋‹จ๊ณ„ (Phase 3)
Phase 3์—์„œ ๊ตฌํ˜„ํ•  **IPO Triple ์ถ”์ถœ ์‹œ์Šคํ…œ**:
1. **IPOTripleExtractor** - AZR Python Executor ๊ธฐ๋ฐ˜ IPO ์ถ”์ถœ
2. **TripleValidator** - ์ถ”์ถœ๋œ ํŠธ๋ฆฌํ”Œ ๊ฒ€์ฆ
3. **AZR ์—ฐ๋™** - `utils/code_utils/python_executor.py` ํ™œ์šฉ
### AZR ์ปดํฌ๋„ŒํŠธ ํ™œ์šฉ ๊ณ„ํš
- `absolute_zero_reasoner/utils/code_utils/python_executor.py` - ์ฝ”๋“œ ์‹คํ–‰
- `absolute_zero_reasoner/trainer/ppo/azr_ray_trainer.py:641-655` - IPO ์ƒ์„ฑ ๋กœ์ง
- `absolute_zero_reasoner/rewards/reward_managers.py:220-233` - ๊ฒ€์ฆ ๋กœ์ง
---
**์ƒ์„ฑ ์ผ์‹œ**: 2025-07-16
**์ƒํƒœ**: โœ… ์™„๋ฃŒ
**ํ…Œ์ŠคํŠธ**: โœ… ํ†ต๊ณผ (3/3)