VizDoom Health Gathering Supreme - APPO Agent

Model Environment Framework

A high-performance reinforcement learning agent trained using APPO (Asynchronous Proximal Policy Optimization) on the VizDoom Health Gathering Supreme environment. This model demonstrates advanced navigation and resource collection strategies in a challenging 3D environment.

๐Ÿ† Performance Metrics

  • Mean Reward: 11.46 ยฑ 3.37
  • Training Steps: 4,005,888 environment steps
  • Episodes Completed: 978 training episodes
  • Architecture: Convolutional Neural Network with shared weights

๐ŸŽฎ Environment Description

The VizDoom Health Gathering Supreme environment is a challenging first-person navigation task where the agent must:

  • Navigate through a complex 3D maze-like environment
  • Collect health packs scattered throughout the level
  • Avoid obstacles and navigate efficiently
  • Maximize survival time while gathering resources
  • Handle visual complexity with realistic 3D graphics

Environment Specifications

  • Observation Space: RGB images (72ร—128ร—3)
  • Action Space: Discrete movement and turning actions
  • Episode Length: Variable (until health depletes or time limit)
  • Difficulty: Supreme (highest difficulty level)

๐Ÿง  Model Architecture

Network Configuration

  • Algorithm: APPO (Asynchronous Proximal Policy Optimization)
  • Encoder: Convolutional Neural Network
    • Input: 3-channel RGB images (72ร—128)
    • Convolutional layers with ReLU activation
    • Output: 512-dimensional feature representation
  • Policy Head: Fully connected layers for action prediction
  • Value Head: Critic network for value function estimation

Training Configuration

  • Framework: Sample-Factory 2.0
  • Batch Size: Optimized for parallel processing
  • Learning Rate: Adaptive scheduling
  • Discount Factor: Standard RL discount
  • Entropy Regularization: Balanced exploration-exploitation

๐Ÿ“ฅ Installation & Setup

Prerequisites

# Install Sample-Factory
pip install sample-factory[all]

# Install VizDoom
pip install vizdoom

Download the Model

python -m sample_factory.huggingface.load_from_hub -r Adilbai/rl_course_vizdoom_health_gathering_supreme

๐Ÿš€ Usage

Running the Trained Agent

# Basic evaluation
python -m sample_factory.enjoy --algo=APPO --env=VizdoomHealthGathering-v0 \
    --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme

# With video recording
python -m sample_factory.enjoy --algo=APPO --env=VizdoomHealthGathering-v0 \
    --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme \
    --save_video --video_frames=10000 --no_render

Python API Usage

from sample_factory.enjoy import enjoy
from sample_factory.cfg.arguments import parse_full_cfg, parse_sf_args

# Configure the environment
env_name = "VizdoomHealthGathering-v0"
cfg = parse_full_cfg(parse_sf_args([
    "--algo=APPO",
    f"--env={env_name}",
    "--train_dir=./train_dir",
    "--experiment=rl_course_vizdoom_health_gathering_supreme"
]))

# Run evaluation
status = enjoy(cfg)

Continue Training

python -m sample_factory.train --algo=APPO --env=VizdoomHealthGathering-v0 \
    --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme \
    --restart_behavior=resume --train_for_env_steps=10000000

๐Ÿ“Š Training Results

Learning Curve

The agent achieved consistent improvement throughout training:

  • Initial Performance: Random exploration
  • Mid Training: Developed basic navigation skills
  • Final Performance: Strategic health pack collection with optimal pathing

Key Behavioral Patterns

  • Efficient Navigation: Learned to navigate the maze structure
  • Resource Prioritization: Focuses on accessible health packs
  • Obstacle Avoidance: Developed spatial awareness
  • Time Management: Balances exploration vs exploitation

๐ŸŽฏ Evaluation Protocol

Standard Evaluation

python -m sample_factory.enjoy --algo=APPO --env=VizdoomHealthGathering-v0 \
    --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme \
    --max_num_episodes=100 --max_num_frames=100000

Performance Metrics

  • Episode Reward: Total health packs collected per episode
  • Survival Time: Duration before episode termination
  • Collection Efficiency: Health packs per time unit
  • Navigation Success: Percentage of successful maze traversals

๐Ÿ”ง Technical Details

Model Files

  • config.json: Complete training configuration
  • checkpoint_*.pth: Model weights and optimizer state
  • sf_log.txt: Detailed training logs
  • stats.json: Performance statistics

Hardware Requirements

  • GPU: NVIDIA GPU with CUDA support (recommended)
  • RAM: 8GB+ system memory
  • Storage: 2GB+ free space for model and dependencies

Troubleshooting

Common Issues

  1. Checkpoint Loading Errors

    # If you encounter encoder architecture mismatches
    # Use the fixed checkpoint with updated key mapping
    
  2. Environment Not Found

    pip install vizdoom
    # Ensure VizDoom is properly installed
    
  3. CUDA Errors

    # For CPU-only evaluation
    python -m sample_factory.enjoy --device=cpu [other args]
    

๐Ÿ“ˆ Benchmarking

Comparison with Baselines

  • Random Agent: ~0.5 average reward
  • Rule-based Agent: ~5.0 average reward
  • This APPO Agent: 8.09 average reward

Performance Analysis

The agent demonstrates:

  • Superior spatial reasoning compared to simpler approaches
  • Robust generalization across different episode initializations
  • Efficient resource collection strategies
  • Stable performance with low variance

๐Ÿ”ฌ Research Applications

This model serves as a strong baseline for:

  • Navigation research in complex 3D environments
  • Multi-objective optimization (survival + collection)
  • Transfer learning to related VizDoom scenarios
  • Curriculum learning progression studies

๐Ÿค Contributing

Contributions are welcome! Areas for improvement:

  • Hyperparameter optimization
  • Architecture modifications
  • Multi-agent scenarios
  • Domain randomization

๐Ÿ“š References

๐Ÿ“ Citation

@misc{vizdoom_health_gathering_supreme_2025,
  title={VizDoom Health Gathering Supreme APPO Agent},
  author={Adilbai},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/Adilbai/rl_course_vizdoom_health_gathering_supreme}
}

๐Ÿ“„ License

This model is released under the MIT License. See the LICENSE file for details.


Note: This model was trained as part of a reinforcement learning course and demonstrates the effectiveness of modern RL algorithms on challenging 3D navigation tasks.

Downloads last month
-
Video Preview
loading

Evaluation results

  • mean_reward on doom_health_gathering_supreme
    self-reported
    11.46 +/- 3.37