Learning Guidance Rewards with Trajectory-space Smoothing
Authors: Tanmay Gangwani, Yuan Zhou, Jian Peng
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section evaluates our approach on various single-agent and multi-agent RL tasks to quantify the beneļ¬ts of using the guidance rewards in place of the environmental rewards, when the latter are sparse or delayed. [...] Figure 3 plots the learning curves for all the algorithms with episodic rewards. |
| Researcher Affiliation | Academia | Tanmay Gangwani Dept. of Computer Science UIUC gangwan2@illinois.edu Yuan Zhou Dept. of ISE UIUC yuanz@illinois.edu Jian Peng Dept. of Computer Science UIUC jianpeng@illinois.edu |
| Pseudocode | Yes | Algorithm 1: Tabular Q-learning with IRCR, Algorithm 2: Soft Actor-Critic with IRCR |
| Open Source Code | Yes | Code for this paper is available at https://github.com/tgangwani/Guidance Rewards |
| Open Datasets | Yes | We benchmark high-dimensional, continuous-control locomotion tasks based on the Mu Jo Co physics simulator, provided in Open AI Gym [3] [...] We adopt the Rover Domain from Rahmattalabi et al. [20]. |
| Dataset Splits | No | No explicit train/validation/test dataset splits are provided for the Mu Jo Co or Rover Domain environments. These are simulation environments where data is generated dynamically, rather than using pre-defined static datasets with fixed splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions using Mu Jo Co physics simulator and Open AI Gym, and various RL algorithms (Q-learning, Actor-Critic, TD3, SAC, Distributional-RL), but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | Please see Appendix A.2 for hyperparameters and other details. [...] We experiment with different values for N, K, and the coupling factor (Appendix A.2). |