Quick Start with garage¶
Table of Content
What is garage?¶
garage is a reinforcement learning (RL) toolkit for developing and evaluating algorithms. The garage library also provides a collection of state-of-the-art implementations of RL algorithms.
The toolkit provides a wide range of modular tools for implementing RL algorithms, including:
Composable neural network models
Replay buffers
High-performance samplers
An expressive experiment definition interface
Tools for reproducibility (e.g. set a global random seed which all components respect)
Logging to many outputs, including TensorBoard
Reliable experiment checkpointing and resuming
Environment interfaces for many popular benchmark suites
Supporting for running garage in diverse environments, including always up-to-date Docker containers
Why garage?¶
garage aims to provide both researchers and developers:
a flexible and structured tool for developing algorithms to solve a variety of RL problems,
a standardized and reproducible environment for experimenting and evaluating RL algorithms,
a collection of benchmarks and examples of RL algorithms.
Kick Start garage¶
This quickstart will show how to quickly get started with garage in 5 minutes.
import garage
Algorithms¶
An array of algorithms are available in garage:
Algorithm |
Framework(s) |
CEM |
|
CMA-ES |
|
REINFORCE (a.k.a. VPG) |
|
DDPG |
|
DQN |
|
DDQN |
TensorFlow |
ERWR |
|
NPO |
|
PPO |
PyTorch, TensorFlow |
REPS |
|
TD3 |
|
TNPG |
|
TRPO |
PyTorch, TensorFlow |
MAML |
PyTorch |
RL2 |
|
PEARL |
|
SAC |
|
MTSAC |
PyTorch |
MTPPO |
PyTorch, TensorFlow |
MTTRPO |
PyTorch, TensorFlow |
Task Embedding |
TensorFlow |
Behavioral Cloning |
They are organized in the github repository as:
└── garage
├── envs
├── experiment
├── misc
├── np
├── plotter
├── replay_buffer
├── sampler
├── tf
└── torch
Note: clickable links represents the directory of algorithms.
A simple pytorch example to import TRPO
algorithm, as well as, the policy GaussianMLPPolicy
and value function GaussianMLPValueFunction
in garage is shown below:
import gym
import torch
from garage.envs import GarageEnv, normalize
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction
def trpo_garage_pytorch():
env = GarageEnv(normalize(gym.make(env_id))) # specify env_id
policy = PyTorch_GMP(env.spec,
hidden_sizes= [32, 32],
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
value_function = GaussianMLPValueFunction(env_spec=env.spec,
hidden_sizes=(32, 32),
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
algo = PyTorch_TRPO(
env_spec=env.spec,
policy=policy,
value_function=value_function,
max_episode_length=100,
discount=0.99,
gae_lambda=0.97)
The full code can be found here.
To know more about implementing new algorithms, see this guide
Running Experiments¶
In garage, experiments are run using the “experiment launcher” wrap_experiment
, a decorated Python function, which can be imported directly from the garage package.
from garage import wrap_experiment
Moreover, objects, such as trainer
, environment
, policy
e.t.c are commonly used when constructing experiments in garage.
import gym
import torch
from garage import wrap_experiment
from garage.envs import GarageEnv, normalize
from garage.experiment import deterministic, LocalRunner
from garage.torch.algos import TRPO as PyTorch_TRPO
from garage.torch.policies import GaussianMLPPolicy as PyTorch_GMP
from garage.torch.value_functions import GaussianMLPValueFunction
@wrap_experiment
def trpo_garage_pytorch(ctxt, env_id, seed):
deterministic.set_seed(seed)
runner = LocalRunner(ctxt)
env = GarageEnv(normalize(gym.make(env_id)))
policy = PyTorch_GMP(env.spec,
hidden_sizes=[32, 32],
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
value_function = GaussianMLPValueFunction(env_spec=env.spec,
hidden_sizes=(32, 32),
hidden_nonlinearity=torch.tanh,
output_nonlinearity=None)
algo = PyTorch_TRPO(
env_spec=env.spec,
policy=policy,
value_function=value_function,
max_episode_length=100,
discount=0.99,
gae_lambda=0.97)
runner.setup(algo, env)
runner.train(n_epochs=999,
batch_size=1024)
This page will give you more insight into running experiments.
Plotting results¶
In garage, we use TensorBoard for plotting experiment results.
This guide will provide details how to set up tensorboard when running experiments in garage.
Experiment outputs¶
Localrunner
is a state manager of experiments in garage, It is set up to create, save and restore the state, also known as snapshot
object, upon/ during an experiment. The snapshot
object includes the hyperparameter configuration, training progress, a pickled object of algorithm(s) and environment(s), tensorboard event file etc.
Experiment results will, by default, output to the same directory as the garage package in the relative directory data/local/experiment
. The output directory is generally organized as the following:
└── data
└── local
└── experiment
└── your_experiment_name
├── progress.csv
├── debug.log
├── variant.json
├── metadata.json
├── launch_archive.tar.xz
└── events.out.tfevents.xxx
wrap_experiment
can be invoked with arguments to support actions like modifying default output directory, changing snapshot modes, controlling snapshot gap etc. For example, to modify the default output directory and change the snapshot mode from last
(only last iteration will be saved) to all
, we can do this:
@wrap_experiment(log_dir='./your_log_dir', snapshot_mode='all')
def my_experiment(ctxt, seed, lr=0.5):
...
During an experiment, garage extensively use logger
from Dowel
for logging outputs to StdOutput, and/ or TextOutput, and/or CsvOutput. For details, you can check this.
Open Source Support¶
Since October 2018, garage is active in the open-source community contributing to RL researches and developments. Any contributions from the community is more than welcomed.
Resources¶
If you are interested in a more in-depth and specific capabilities of garage, you can find many other guides in this website such as, but not limited to, the followings:
This page was authored by Iris Liu (@irisliucy).