Home
cd ../playbooks
Developer ToolsAdvanced

Scientific Skill: Pufferlib

High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10...

15 minutes
By K-Dense AISource
#scientific#claude-code#pufferlib#visualization#database#protein
CLAUDE.md Template

Download this file and place it in your project folder to get started.

# PufferLib - High-Performance Reinforcement Learning

## Overview

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

## When to Use This Skill

Use this skill when:
- **Training RL agents** with PPO on any environment (single or multi-agent)
- **Creating custom environments** using the PufferEnv API
- **Optimizing performance** for parallel environment simulation (vectorization)
- **Integrating existing environments** from Gymnasium, PettingZoo, Atari, Procgen, etc.
- **Developing policies** with CNN, LSTM, or custom architectures
- **Scaling RL** to millions of steps per second for faster experimentation
- **Multi-agent RL** with native multi-agent environment support

## Core Capabilities

### 1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

**Quick start training:**
```bash
# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

# Distributed training
torchrun --nproc_per_node=4 train.py
```

**Python training loop:**
```python
import pufferlib
from pufferlib import PuffeRL

# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Create trainer
trainer = PuffeRL(
    env=env,
    policy=my_policy,
    device='cuda',
    learning_rate=3e-4,
    batch_size=32768
)

# Training loop
for iteration in range(num_iterations):
    trainer.evaluate()  # Collect rollouts
    trainer.train()     # Train on batch
    trainer.mean_and_log()  # Log results
```

**For comprehensive training guidance**, read `references/training.md` for:
- Complete training workflow and CLI options
- Hyperparameter tuning with Protein
- Distributed multi-GPU/multi-node training
- Logger integration (Weights & Biases, Neptune)
- Checkpointing and resume training
- Performance optimization tips
- Curriculum learning patterns

### 2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

**Basic environment structure:**
```python
import numpy as np
from pufferlib import PufferEnv

class MyEnvironment(PufferEnv):
    def __init__(self, buf=None):
        super().__init__(buf)

        # Define spaces
        self.observation_space = self.make_space((4,))
        self.action_space = self.make_discrete(4)

        self.reset()

    def reset(self):
        # Reset state and return initial observation
        return np.zeros(4, dtype=np.float32)

    def step(self, action):
        # Execute action, compute reward, check done
        obs = self._get_observation()
        reward = self._compute_reward()
        done = self._is_done()
        info = {}

        return obs, reward, done, info
```

**Use the template script:** `scripts/env_template.py` provides complete single-agent and multi-agent environment templates with examples of:
- Different observation space types (vector, image, dict)
- Action space variations (discrete, continuous, multi-discrete)
- Multi-agent environment structure
- Testing utilities

**For complete environment development**, read `references/environments.md` for:
- PufferEnv API details and in-place operation patterns
- Observation and action space definitions
- Multi-agent environment creation
- Ocean suite (20+ pre-built environments)
- Performance optimization (Python to C workflow)
- Environment wrappers and best practices
- Debugging and validation techniques

### 3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

**Vectorization setup:**
```python
import pufferlib

# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS
```

**Key optimizations:**
- Shared memory buffers for zero-copy observation passing
- Busy-wait flags instead of pipes/queues
- Surplus environments for async returns
- Multiple environments per worker

**For vectorization optimization**, read `references/vectorization.md` for:
- Architecture and performance characteristics
- Worker and batch size configuration
- Serial vs multiprocessing vs async modes
- Shared memory and zero-copy patterns
- Hierarchical vectorization for large scale
- Multi-agent vectorization strategies
- Performance profiling and troubleshooting

### 4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

**Basic policy structure:**
```python
import torch.nn as nn
from pufferlib.pytorch import layer_init

class Policy(nn.Module):
    def __init__(self, observation_space, action_space):
        super().__init__()

        # Encoder
        self.encoder = nn.Sequential(
            layer_init(nn.Linear(obs_dim, 256)),
            nn.ReLU(),
            layer_init(nn.Linear(256, 256)),
            nn.ReLU()
        )

        # Actor and critic heads
        self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
        self.critic = layer_init(nn.Linear(256, 1), std=1.0)

    def forward(self, observations):
        features = self.encoder(observations)
        return self.actor(features), self.critic(features)
```

**For complete policy development**, read `references/policies.md` for:
- CNN policies for image observations
- Recurrent policies with optimized LSTM (3x faster inference)
- Multi-input policies for complex observations
- Continuous action policies
- Multi-agent policies (shared vs independent parameters)
- Advanced architectures (attention, residual)
- Observation normalization and gradient clipping
- Policy debugging and testing

### 5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

**Gymnasium integration:**
```python
import gymnasium as gym
import pufferlib

# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)

# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
```

**PettingZoo multi-agent:**
```python
# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
```

**Supported frameworks:**
- Gymnasium / OpenAI Gym
- PettingZoo (parallel and AEC)
- Atari (ALE)
- Procgen
- NetHack / MiniHack
- Minigrid
- Neural MMO
- Crafter
- GPUDrive
- MicroRTS
- Griddly
- And more...

**For integration details**, read `references/integration.md` for:
- Complete integration examples for each framework
- Custom wrappers (observation, reward, frame stacking, action repeat)
- Space flattening and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Integration debugging

## Quick Start Workflow

### For Training Existing Environments

1. Choose environment from Ocean suite or compatible framework
2. Use `scripts/train_template.py` as starting point
3. Configure hyperparameters for your task
4. Run training with CLI or Python script
5. Monitor with Weights & Biases or Neptune
6. Refer to `references/training.md` for optimization

### For Creating Custom Environments

1. Start with `scripts/env_template.py`
2. Define observation and action spaces
3. Implement `reset()` and `step()` methods
4. Test environment locally
5. Vectorize with `pufferlib.emulate()` or `make()`
6. Refer to `references/environments.md` for advanced patterns
7. Optimize with `references/vectorization.md` if needed

### For Policy Development

1. Choose architecture based on observations:
   - Vector observations → MLP policy
   - Image observations → CNN policy
   - Sequential tasks → LSTM policy
   - Complex observations → Multi-input policy
2. Use `layer_init` for proper weight initialization
3. Follow patterns in `references/policies.md`
4. Test with environment before full training

### For Performance Optimization

1. Profile current throughput (steps per second)
2. Check vectorization configuration (num_envs, num_workers)
3. Optimize environment code (in-place ops, numpy vectorization)
4. Consider C implementation for critical paths
5. Use `references/vectorization.md` for systematic optimization

## Resources

### scripts/

**train_template.py** - Complete training script template with:
- Environment creation and configuration
- Policy initialization
- Logger integration (WandB, Neptune)
- Training loop with checkpointing
- Command-line argument parsing
- Multi-GPU distributed training setup

**env_template.py** - Environment implementation templates:
- Single-agent PufferEnv example (grid world)
- Multi-agent PufferEnv example (cooperative navigation)
- Multiple observation/action space patterns
- Testing utilities

### references/

**training.md** - Comprehensive training guide:
- Training workflow and CLI options
- Hyperparameter configuration
- Distributed training (multi-GPU, multi-node)
- Monitoring and logging
- Checkpointing
- Protein hyperparameter tuning
- Performance optimization
- Common training patterns
- Troubleshooting

**environments.md** - Environment development guide:
- PufferEnv API and characteristics
- Observation and action spaces
- Multi-agent environments
- Ocean suite environments
- Custom environment development workflow
- Python to C optimization path
- Third-party environment integration
- Wrappers and best practices
- Debugging

**vectorization.md** - Vectorization optimization:
- Architecture and key optimizations
- Vectorization modes (serial, multiprocessing, async)
- Worker and batch configuration
- Shared memory and zero-copy patterns
- Advanced vectorization (hierarchical, custom)
- Multi-agent vectorization
- Performance monitoring and profiling
- Troubleshooting and best practices

**policies.md** - Policy architecture guide:
- Basic policy structure
- CNN policies for images
- LSTM policies with optimization
- Multi-input policies
- Continuous action policies
- Multi-agent policies
- Advanced architectures (attention, residual)
- Observation processing and unflattening
- Initialization and normalization
- Debugging and testing

**integration.md** - Framework integration guide:
- Gymnasium integration
- PettingZoo integration (parallel and AEC)
- Third-party environments (Procgen, NetHack, Minigrid, etc.)
- Custom wrappers (observation, reward, frame stacking, etc.)
- Space conversion and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Debugging integration

## Tips for Success

1. **Start simple**: Begin with Ocean environments or Gymnasium integration before creating custom environments

2. **Profile early**: Measure steps per second from the start to identify bottlenecks

3. **Use templates**: `scripts/train_template.py` and `scripts/env_template.py` provide solid starting points

4. **Read references as needed**: Each reference file is self-contained and focused on a specific capability

5. **Optimize progressively**: Start with Python, profile, then optimize critical paths with C if needed

6. **Leverage vectorization**: PufferLib's vectorization is key to achieving high throughput

7. **Monitor training**: Use WandB or Neptune to track experiments and identify issues early

8. **Test environments**: Validate environment logic before scaling up training

9. **Check existing environments**: Ocean suite provides 20+ pre-built environments

10. **Use proper initialization**: Always use `layer_init` from `pufferlib.pytorch` for policies

## Common Use Cases

### Training on Standard Benchmarks
```python
# Atari
env = pufferlib.make('atari-pong', num_envs=256)

# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
```

### Multi-Agent Learning
```python
# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)

# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)
```

### Custom Task Development
```python
# Create custom environment
class MyTask(PufferEnv):
    # ... implement environment ...

# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)
```

### High-Performance Optimization
```python
# Maximize throughput
env = pufferlib.make(
    'my-env',
    num_envs=1024,      # Large batch
    num_workers=16,     # Many workers
    envs_per_worker=64  # Optimize per worker
)
```

## Installation

```bash
uv pip install pufferlib
```

## Documentation

- Official docs: https://puffer.ai/docs.html
- GitHub: https://github.com/PufferAI/PufferLib
- Discord: Community support available
README.md

What This Does

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.


Quick Start

Step 1: Create a Project Folder

mkdir -p ~/Projects/pufferlib

Step 2: Download the Template

Click Download above, then:

mv ~/Downloads/CLAUDE.md ~/Projects/pufferlib/

Step 3: Start Claude Code

cd ~/Projects/pufferlib
claude

Core Capabilities

1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

Quick start training:

# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

# Distributed training
torchrun --nproc_per_node=4 train.py

Python training loop:

import pufferlib
from pufferlib import PuffeRL

# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Create trainer
trainer = PuffeRL(
    env=env,
    policy=my_policy,
    device='cuda',
    learning_rate=3e-4,
    batch_size=32768
)

# Training loop
for iteration in range(num_iterations):
    trainer.evaluate()  # Collect rollouts
    trainer.train()     # Train on batch
    trainer.mean_and_log()  # Log results

For comprehensive training guidance, read references/training.md for:

  • Complete training workflow and CLI options
  • Hyperparameter tuning with Protein
  • Distributed multi-GPU/multi-node training
  • Logger integration (Weights & Biases, Neptune)
  • Checkpointing and resume training
  • Performance optimization tips
  • Curriculum learning patterns

2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

Basic environment structure:

import numpy as np
from pufferlib import PufferEnv

class MyEnvironment(PufferEnv):
    def __init__(self, buf=None):
        super().__init__(buf)

        # Define spaces
        self.observation_space = self.make_space((4,))
        self.action_space = self.make_discrete(4)

        self.reset()

    def reset(self):
        # Reset state and return initial observation
        return np.zeros(4, dtype=np.float32)

    def step(self, action):
        # Execute action, compute reward, check done
        obs = self._get_observation()
        reward = self._compute_reward()
        done = self._is_done()
        info = {}

        return obs, reward, done, info

Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

  • Different observation space types (vector, image, dict)
  • Action space variations (discrete, continuous, multi-discrete)
  • Multi-agent environment structure
  • Testing utilities

For complete environment development, read references/environments.md for:

  • PufferEnv API details and in-place operation patterns
  • Observation and action space definitions
  • Multi-agent environment creation
  • Ocean suite (20+ pre-built environments)
  • Performance optimization (Python to C workflow)
  • Environment wrappers and best practices
  • Debugging and validation techniques

3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

Vectorization setup:

import pufferlib

# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS

Key optimizations:

  • Shared memory buffers for zero-copy observation passing
  • Busy-wait flags instead of pipes/queues
  • Surplus environments for async returns
  • Multiple environments per worker

For vectorization optimization, read references/vectorization.md for:

  • Architecture and performance characteristics
  • Worker and batch size configuration
  • Serial vs multiprocessing vs async modes
  • Shared memory and zero-copy patterns
  • Hierarchical vectorization for large scale
  • Multi-agent vectorization strategies
  • Performance profiling and troubleshooting

4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

Basic policy structure:

import torch.nn as nn
from pufferlib.pytorch import layer_init

class Policy(nn.Module):
    def __init__(self, observation_space, action_space):
        super().__init__()

        # Encoder
        self.encoder = nn.Sequential(
            layer_init(nn.Linear(obs_dim, 256)),
            nn.ReLU(),
            layer_init(nn.Linear(256, 256)),
            nn.ReLU()
        )

        # Actor and critic heads
        self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
        self.critic = layer_init(nn.Linear(256, 1), std=1.0)

    def forward(self, observations):
        features = self.encoder(observations)
        return self.actor(features), self.critic(features)

For complete policy development, read references/policies.md for:

  • CNN policies for image observations
  • Recurrent policies with optimized LSTM (3x faster inference)
  • Multi-input policies for complex observations
  • Continuous action policies
  • Multi-agent policies (shared vs independent parameters)
  • Advanced architectures (attention, residual)
  • Observation normalization and gradient clipping
  • Policy debugging and testing

5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

Gymnasium integration:

import gymnasium as gym
import pufferlib

# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)

# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)

PettingZoo multi-agent:

# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)

Supported frameworks:

  • Gymnasium / OpenAI Gym
  • PettingZoo (parallel and AEC)
  • Atari (ALE)
  • Procgen
  • NetHack / MiniHack
  • Minigrid
  • Neural MMO
  • Crafter
  • GPUDrive
  • MicroRTS
  • Griddly
  • And more...

For integration details, read references/integration.md for:

  • Complete integration examples for each framework
  • Custom wrappers (observation, reward, frame stacking, action repeat)
  • Space flattening and unflattening
  • Environment registration
  • Compatibility patterns
  • Performance considerations
  • Integration debugging

Quick Start Workflow

For Training Existing Environments

  1. Choose environment from Ocean suite or compatible framework
  2. Use scripts/train_template.py as starting point
  3. Configure hyperparameters for your task
  4. Run training with CLI or Python script
  5. Monitor with Weights & Biases or Neptune
  6. Refer to references/training.md for optimization

For Creating Custom Environments

  1. Start with scripts/env_template.py
  2. Define observation and action spaces
  3. Implement reset() and step() methods
  4. Test environment locally
  5. Vectorize with pufferlib.emulate() or make()
  6. Refer to references/environments.md for advanced patterns
  7. Optimize with references/vectorization.md if needed

For Policy Development

  1. Choose architecture based on observations:
    • Vector observations → MLP policy
    • Image observations → CNN policy
    • Sequential tasks → LSTM policy
    • Complex observations → Multi-input policy
  2. Use layer_init for proper weight initialization
  3. Follow patterns in references/policies.md
  4. Test with environment before full training

For Performance Optimization

  1. Profile current throughput (steps per second)
  2. Check vectorization configuration (num_envs, num_workers)
  3. Optimize environment code (in-place ops, numpy vectorization)
  4. Consider C implementation for critical paths
  5. Use references/vectorization.md for systematic optimization

Resources

scripts/

train_template.py - Complete training script template with:

  • Environment creation and configuration
  • Policy initialization
  • Logger integration (WandB, Neptune)
  • Training loop with checkpointing
  • Command-line argument parsing
  • Multi-GPU distributed training setup

env_template.py - Environment implementation templates:

  • Single-agent PufferEnv example (grid world)
  • Multi-agent PufferEnv example (cooperative navigation)
  • Multiple observation/action space patterns
  • Testing utilities

references/

training.md - Comprehensive training guide:

  • Training workflow and CLI options
  • Hyperparameter configuration
  • Distributed training (multi-GPU, multi-node)
  • Monitoring and logging
  • Checkpointing
  • Protein hyperparameter tuning
  • Performance optimization
  • Common training patterns
  • Troubleshooting

environments.md - Environment development guide:

  • PufferEnv API and characteristics
  • Observation and action spaces
  • Multi-agent environments
  • Ocean suite environments
  • Custom environment development workflow
  • Python to C optimization path
  • Third-party environment integration
  • Wrappers and best practices
  • Debugging

vectorization.md - Vectorization optimization:

  • Architecture and key optimizations
  • Vectorization modes (serial, multiprocessing, async)
  • Worker and batch configuration
  • Shared memory and zero-copy patterns
  • Advanced vectorization (hierarchical, custom)
  • Multi-agent vectorization
  • Performance monitoring and profiling
  • Troubleshooting and best practices

policies.md - Policy architecture guide:

  • Basic policy structure
  • CNN policies for images
  • LSTM policies with optimization
  • Multi-input policies
  • Continuous action policies
  • Multi-agent policies
  • Advanced architectures (attention, residual)
  • Observation processing and unflattening
  • Initialization and normalization
  • Debugging and testing

integration.md - Framework integration guide:

  • Gymnasium integration
  • PettingZoo integration (parallel and AEC)
  • Third-party environments (Procgen, NetHack, Minigrid, etc.)
  • Custom wrappers (observation, reward, frame stacking, etc.)
  • Space conversion and unflattening
  • Environment registration
  • Compatibility patterns
  • Performance considerations
  • Debugging integration

Tips for Success

  1. Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments

  2. Profile early: Measure steps per second from the start to identify bottlenecks

  3. Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points

  4. Read references as needed: Each reference file is self-contained and focused on a specific capability

  5. Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed

  6. Leverage vectorization: PufferLib's vectorization is key to achieving high throughput

  7. Monitor training: Use WandB or Neptune to track experiments and identify issues early

  8. Test environments: Validate environment logic before scaling up training

  9. Check existing environments: Ocean suite provides 20+ pre-built environments

  10. Use proper initialization: Always use layer_init from pufferlib.pytorch for policies

Common Use Cases

Training on Standard Benchmarks

# Atari
env = pufferlib.make('atari-pong', num_envs=256)

# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)

Multi-Agent Learning

# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)

# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)

Custom Task Development

# Create custom environment
class MyTask(PufferEnv):
    # ... implement environment ...

# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)

High-Performance Optimization

# Maximize throughput
env = pufferlib.make(
    'my-env',
    num_envs=1024,      # Large batch
    num_workers=16,     # Many workers
    envs_per_worker=64  # Optimize per worker
)

Installation

uv pip install pufferlib

Documentation

$Related Playbooks