reproducibilityindex.ai

Learning Guidance Rewards with Trajectory-space Smoothing

Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section evaluates our approach on various single-agent and multi-agent RL tasks to quantify the beneﬁts of using the guidance rewards in place of the environmental rewards, when the latter are sparse or delayed. [...] Figure 3 plots the learning curves for all the algorithms with episodic rewards.
Researcher Affiliation	Academia	Tanmay Gangwani Dept. of Computer Science UIUC gangwan2@illinois.edu Yuan Zhou Dept. of ISE UIUC yuanz@illinois.edu Jian Peng Dept. of Computer Science UIUC jianpeng@illinois.edu
Pseudocode	Yes	Algorithm 1: Tabular Q-learning with IRCR, Algorithm 2: Soft Actor-Critic with IRCR
Open Source Code	Yes	Code for this paper is available at https://github.com/tgangwani/Guidance Rewards
Open Datasets	Yes	We benchmark high-dimensional, continuous-control locomotion tasks based on the Mu Jo Co physics simulator, provided in Open AI Gym [3] [...] We adopt the Rover Domain from Rahmattalabi et al. [20].
Dataset Splits	No	No explicit train/validation/test dataset splits are provided for the Mu Jo Co or Rover Domain environments. These are simulation environments where data is generated dynamically, rather than using pre-defined static datasets with fixed splits.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions using Mu Jo Co physics simulator and Open AI Gym, and various RL algorithms (Q-learning, Actor-Critic, TD3, SAC, Distributional-RL), but does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	Please see Appendix A.2 for hyperparameters and other details. [...] We experiment with different values for N, K, and the coupling factor (Appendix A.2).