Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Guidance Rewards with Trajectory-space Smoothing
Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section evaluates our approach on various single-agent and multi-agent RL tasks to quantify the bene๏ฌts of using the guidance rewards in place of the environmental rewards, when the latter are sparse or delayed. [...] Figure 3 plots the learning curves for all the algorithms with episodic rewards. |
| Researcher Affiliation | Academia | Tanmay Gangwani Dept. of Computer Science UIUC EMAIL Yuan Zhou Dept. of ISE UIUC EMAIL Jian Peng Dept. of Computer Science UIUC EMAIL |
| Pseudocode | Yes | Algorithm 1: Tabular Q-learning with IRCR, Algorithm 2: Soft Actor-Critic with IRCR |
| Open Source Code | Yes | Code for this paper is available at https://github.com/tgangwani/Guidance Rewards |
| Open Datasets | Yes | We benchmark high-dimensional, continuous-control locomotion tasks based on the Mu Jo Co physics simulator, provided in Open AI Gym [3] [...] We adopt the Rover Domain from Rahmattalabi et al. [20]. |
| Dataset Splits | No | No explicit train/validation/test dataset splits are provided for the Mu Jo Co or Rover Domain environments. These are simulation environments where data is generated dynamically, rather than using pre-defined static datasets with fixed splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using Mu Jo Co physics simulator and Open AI Gym, and various RL algorithms (Q-learning, Actor-Critic, TD3, SAC, Distributional-RL), but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | Please see Appendix A.2 for hyperparameters and other details. [...] We experiment with different values for N, K, and the coupling factor (Appendix A.2). |