reproducibilityindex.ai

Adversarial Intrinsic Motivation for Reinforcement Learning

Authors: Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that this reward function changes smoothly with respect to transitions in the MDP and directs the agent s exploration to ﬁnd the goal efﬁciently. Additionally, we combine AIM with Hindsight Experience Replay (HER) and show that the resulting algorithm accelerates learning signiﬁcantly on several simulated robotics tasks when compared to other rewards that encourage exploration or accelerate learning.
Researcher Affiliation	Collaboration	Ishan Durugkar Department of Computer Science The University of Texas at Austin Austin, TX, USA 78703 ishand@cs.utexas.edu Mauricio Tec Department of Statistics and Data Sciences The University of Texas at Austin Austin, TX, USA 78703 mauriciogtec@utexas.edu Scott Niekum Department of Computer Science The University of Texas at Austin Austin, TX, USA 78703 sniekum@cs.utexas.edu Peter Stone Department of Computer Science The University of Texas at Austin Austin, TX, USA 78703 and Sony AI pstone@cs.utexas.edu
Pseudocode	Yes	The basic procedure to learn and use adversarial intrinsic motivation (AIM) is laid out in Algorithm 1, and also includes how to use this algorithm in conjunction with HER.
Open Source Code	No	The paper states "We used the HER implementation using Twin Delayed DDPG (TD3) [26] as the underlying RL algorithm from the stable baselines repository [38]." but does not provide a link or statement about its own open-source code for AIM.
Open Datasets	Yes	The Fetch robot tasks from Open AI gym [15] which have been used to evaluate learning of goal-conditioned policies previously [1, 80]. Descriptions of these tasks and their goal space is in Appendix H. We soften the Dirac target distribution for continuous states to instead be a Gaussian with variance of 0.01 of the range of each feature.
Dataset Splits	No	The paper mentions "We did an extensive sweep of the hyperparameters for the baseline HER + R (laid out in Appendix H), with a coarser search on relevant hyperparameters for AIM." This indicates hyperparameter tuning, but it does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, and testing.
Hardware Specification	No	The paper describes experiments in "simulated robotics tasks" and the "Mu Jo Co simulator", but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	We used the HER implementation using Twin Delayed DDPG (TD3) [26] as the underlying RL algorithm from the stable baselines repository [38]. While 'stable-baselines' is mentioned, no specific version number for it or other software dependencies is provided.
Experiment Setup	Yes	We did an extensive sweep of the hyperparameters for the baseline HER + R (laid out in Appendix H), with a coarser search on relevant hyperparameters for AIM.