reproducibilityindex.ai

Generalized Hindsight for Reinforcement Learning

Authors: Alexander Li, Lerrel Pinto, Pieter Abbeel

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our algorithm on several multi-task control tasks, and find that AIR consistently achieves higher asymptotic performance using as few as 20% of the environment interactions as our baselines. We also introduce a computationally more efficient version, which relabels by comparing trajectory rewards to a learned baseline, that also achieves higher asymptotic performance than our baselines. (Section 4 Experimental Evaluation)
Researcher Affiliation	Academia	Alexander C. Li University of California, Berkeley alexli1@berkeley.edu Lerrel Pinto New York University lerrel@cs.nyu.edu Pieter Abbeel University of California, Berkeley pabbeel@cs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Generalized Hindsight; Algorithm 2 SIRL: Approximate IRL; Algorithm 3 SA: Trajectory Advantage
Open Source Code	No	Website: sites.google.com/view/generalized-hindsight (Upon checking the website, it states 'Code: TBA', indicating code is not yet available.)
Open Datasets	Yes	These environments will be released for open-source access.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits, exact percentages, sample counts, or citations to predefined splits.
Hardware Specification	No	The paper states, 'We thank AWS for computing resources.' However, it does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running experiments.
Software Dependencies	No	The paper mentions using Soft Actor-Critic (SAC), Adam optimizer, and OpenAI Gym, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	In our experiments, we simply select m = 1 task out of K = 100 sampled task variables for all environments and both relabeling strategies. [...] We found that a batch size of 256 for all Half Cheetah experiments and 128 for others was optimal. We use the Adam optimizer with a learning rate of 3e-4 for all networks.