-
Notifications
You must be signed in to change notification settings - Fork 3
Buffers
Specifying buffer size and type via command line arguments will override the settings in the configuration files:
-
--bsize: Buffer size (default is as specified in the configuration file) -
--btype: Buffer type (default is as specified in the configuration file), options are ER-v0 / PER-v0
In the absence of command-line overrides, buffer settings can be specified in the agent's configuration file:
-
buffer_type: Set to either ER-v0 or PER-v0. The default setting uses an experience replay buffer.
-
buffer_capacity: The capacity of the buffer, default is 1,000,000.
For the Prioritized Experience Replay (PER), you can specify the prioritization scheme in the configuration files:
-
prioritization_type: Choose betweenproportionalorrank. -
alpha: The alpha value, which depends on the prioritization type (default 0.6 for proportional, 0.7 for rank). -
beta: The beta value, also depending on the prioritization type (default 0.4 for proportional, 0.5 for rank). -
beta_increment: The rate at which beta is increased, default is 0.001.
Run the script with command line overrides for the buffer:
python -O drivers/run_continuous.py --agent KerasTD3-v0 --env HalfCheetah-v4 --btype PER-v0 --bsize 1000000 --nepisodes 1000The Replay class is an abstract base class (ABC) designed for creating various types of replay buffers in reinforcement learning (RL). Replay buffers store and manage the experiences of an agent during training. Experiences typically include states, actions, rewards, next states, and done signals. This base class outlines the essential structure and functionalities required for any replay buffer implementation.
Initializes a new instance of the Replay buffer. This method is intended to define all key variables required for all buffers. However, in this base class, the method body is left empty (pass) to be defined by subclasses.
Parameters:
-
state: The initial state of the environment. -
action: The action taken by the agent. -
reward: The reward received after taking the action. -
next_state: The state of the environment after the action is taken. -
done: A boolean indicating whether the episode has ended. -
probability: The probability distribution used for sampling experiences.
Note: Subclasses should provide implementations that initialize these parameters as needed for their specific type of replay buffer.
An abstract method that must be implemented by subclasses. It is used to add new experiences into the buffer.
Parameters:
-
memory: The experience to add to the buffer. The structure ofmemoryshould align with the expected format of the replay buffer.
An abstract method that must be implemented by subclasses. It should return a sample of experiences from the buffer based on a probability distribution.
Parameters:
-
nsamples: The number of samples to return from the buffer.
Returns:
- A sample of experiences from the buffer.
An abstract method that must be implemented by subclasses. It should save the current state of the buffer to a file.
Parameters:
-
filename: The name of the file where the buffer will be saved. Defaults to'replay_buffer.npy'.
An abstract method that must be implemented by subclasses. It should save the configuration of the buffer. The specifics of what constitutes the buffer's configuration are left to the subclass's discretion.
An abstract method that must be implemented by subclasses. It should load previously saved experiences into the buffer.
Parameters:
-
filename: The name of the file from which to load the buffer.
An abstract method that must be implemented by subclasses. It should return the number of experiences currently stored in the buffer.
Returns:
- The number of memories in the buffer.
To implement a specific type of replay buffer (e.g., a simple FIFO buffer, prioritized experience replay), you must subclass Replay and provide concrete implementations for all the abstract methods. This includes initializing necessary variables in the constructor, handling the addition and sampling of experiences, and managing the persistence of the buffer's state and configuration.