SPIRAL OctoThinker-8B Multi-Agent Model

This model was trained using the SPIRAL (Self-Play Iterative Reinforcement learning for Adaptation and Learning) framework.

Model Details

  • Base Model: OctoAI/OctoThinker-8B
  • Training Framework: SPIRAL
  • Checkpoint: step_00288
  • Model Size: 8B parameters
  • Training Date: 2025-08-31

Training Configuration

The model was trained with self-play on multiple environments:

  • KuhnPoker-v1
  • TicTacToe-v0
  • SimpleNegotiation-v1

Training Parameters

{
  "learning_rate": "1e-6",
  "train_batch_size": 128,
  "num_ppo_epochs": 2,
  "temperature": 1.0,
  "max_model_len": 16384,
  "environments": [
    "KuhnPoker-v1",
    "TicTacToe-v0",
    "SimpleNegotiation-v1"
  ],
  "base_model": "OctoThinker-8B",
  "framework": "SPIRAL"
}

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("the-acorn-ai/spiral-octothinker-8b-multi-three-games-step00288")
model = AutoModelForCausalLM.from_pretrained(
    "the-acorn-ai/spiral-octothinker-8b-multi-three-games-step00288",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Generate text
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

License

This model is licensed under the Apache License 2.0.

Downloads last month
16
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support