Skip to content

Buffers

Armen Kasparian edited this page Mar 28, 2024 · 5 revisions

Selecting the Replay Buffer

Command Line Overrides (Preferred)

Specifying buffer size and type via command line arguments will override the settings in the configuration files:

  • --bsize: Buffer size (default is as specified in the configuration file)
  • --btype: Buffer type (default is as specified in the configuration file), options are ER-v0 / PER-v0

Configuration via Agent Config File

In the absence of command-line overrides, buffer settings can be specified in the agent's configuration file:

  • buffer_type: Set to either ER-v0 or PER-v0. The default setting uses an experience replay buffer.

Configuration File Details

General Configuration for ER/PER

  • buffer_capacity: The capacity of the buffer, default is 1,000,000.

Specifics for PER

For the Prioritized Experience Replay (PER), you can specify the prioritization scheme in the configuration files:

  • prioritization_type: Choose between proportional or rank.
  • alpha: The alpha value, which depends on the prioritization type (default 0.6 for proportional, 0.7 for rank).
  • beta: The beta value, also depending on the prioritization type (default 0.4 for proportional, 0.5 for rank).
  • beta_increment: The rate at which beta is increased, default is 0.001.

Examples

Run the script with command line overrides for the buffer:

python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --btype PER-v0 --bsize 1000000 --nepisodes 1000

Replay Buffer Base Class Documentation

The Replay class is an abstract base class (ABC) designed for creating various types of replay buffers in reinforcement learning (RL). Replay buffers store and manage the experiences of an agent during training. Experiences typically include states, actions, rewards, next states, and done signals. This base class outlines the essential structure and functionalities required for any replay buffer implementation.

Constructor

__init__(self, state, action, reward, next_state, done, probability)

Initializes a new instance of the Replay buffer. This method is intended to define all key variables required for all buffers. However, in this base class, the method body is left empty (pass) to be defined by subclasses.

Parameters:

  • state: The initial state of the environment.
  • action: The action taken by the agent.
  • reward: The reward received after taking the action.
  • next_state: The state of the environment after the action is taken.
  • done: A boolean indicating whether the episode has ended.
  • probability: The probability distribution used for sampling experiences.

Note: Subclasses should provide implementations that initialize these parameters as needed for their specific type of replay buffer.

Methods

record(self, memory)

An abstract method that must be implemented by subclasses. It is used to add new experiences into the buffer.

Parameters:

  • memory: The experience to add to the buffer. The structure of memory should align with the expected format of the replay buffer.

sample(self, nsamples)

An abstract method that must be implemented by subclasses. It should return a sample of experiences from the buffer based on a probability distribution.

Parameters:

  • nsamples: The number of samples to return from the buffer.

Returns:

  • A sample of experiences from the buffer.

save(self, filename='replay_buffer.npy')

An abstract method that must be implemented by subclasses. It should save the current state of the buffer to a file.

Parameters:

  • filename: The name of the file where the buffer will be saved. Defaults to 'replay_buffer.npy'.

save_cfg(self)

An abstract method that must be implemented by subclasses. It should save the configuration of the buffer. The specifics of what constitutes the buffer's configuration are left to the subclass's discretion.

load(self, filename)

An abstract method that must be implemented by subclasses. It should load previously saved experiences into the buffer.

Parameters:

  • filename: The name of the file from which to load the buffer.

size(self)

An abstract method that must be implemented by subclasses. It should return the number of experiences currently stored in the buffer.

Returns:

  • The number of memories in the buffer.

Implementing a Replay Buffer

To implement a specific type of replay buffer (e.g., a simple FIFO buffer, prioritized experience replay), you must subclass Replay and provide concrete implementations for all the abstract methods. This includes initializing necessary variables in the constructor, handling the addition and sampling of experiences, and managing the persistence of the buffer's state and configuration.