Episodic Reward Weighted Regression (ERWR)¶

Papers	Using Reward-weighted Regression for Reinforcement Learning of Task Space Control [1] Policy Search for Motor Primitives in Robotics [2]
Framework(s)	Tensorflow¶
API Reference	garage.tf.algos.ERWR
Code	garage/tf/algos/erwr.py
Examples	erwr_cartpole

Episodic Reward Weighted Regression (ERWR) is an extension of the original RWR algorithm, which uses a linear policy to solve the immediate rewards learning problem. The extension implemented here applies RWR to episodic reinforcement learning. To read more about both algorithms see the cited papers or the summary provided in this text.

Default Parameters¶

scope=None
discount=0.99
gae_lambda=1
center_adv=True
positive_adv=True
fixed_horizon=False
lr_clip_range=0.01
max_kl_step=0.01
optimizer=None
optimizer_args=None
policy_ent_coeff=0.0
use_softplus_entropy=False
use_neg_logli_entropy=False
stop_entropy_gradient=False
entropy_method='no_entropy'
name='ERWR'

Examples¶

erwr_cartpole¶

#!/usr/bin/env python3
"""This is an example to train a task with ERWR algorithm.

Here it runs CartpoleEnv on ERWR with 100 iterations.

Results:
    AverageReturn: 100
    RiseTime: itr 34
"""
from garage import wrap_experiment
from garage.envs import GymEnv
from garage.experiment.deterministic import set_seed
from garage.np.baselines import LinearFeatureBaseline
from garage.tf.algos import ERWR
from garage.tf.policies import CategoricalMLPPolicy
from garage.trainer import TFTrainer


@wrap_experiment
def erwr_cartpole(ctxt=None, seed=1):
    """Train with ERWR on CartPole-v1 environment.

    Args:
        ctxt (garage.experiment.ExperimentContext): The experiment
            configuration used by Trainer to create the snapshotter.
        seed (int): Used to seed the random number generator to produce
            determinism.

    """
    set_seed(seed)
    with TFTrainer(snapshot_config=ctxt) as trainer:
        env = GymEnv('CartPole-v1')

        policy = CategoricalMLPPolicy(name='policy',
                                      env_spec=env.spec,
                                      hidden_sizes=(32, 32))

        baseline = LinearFeatureBaseline(env_spec=env.spec)

        algo = ERWR(env_spec=env.spec,
                    policy=policy,
                    baseline=baseline,
                    discount=0.99)

        trainer.setup(algo=algo, env=env)

        trainer.train(n_epochs=100, batch_size=10000, plot=False)


erwr_cartpole(seed=1)

References¶

1: J. Peters and S. Schaal. Using reward-weighted regression for reinforcement learning of task space control. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, volume, 262–267. 2007.
2: J. Kober and J. Peters. Policy search for motor primitives in robotics. Advances in neural information processing systems 21 : 22nd Annual Conference on Neural Information Processing Systems 2008, pages 849–856, June 2009.

This page was authored by Mishari Aliesa (@maliesa96).