Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards
Authors: Rati Devidze, Parameswaran Kamalaruban, Adish Singla
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on several environments with sparse/noisy reward signals demonstrate the effectiveness of EXPLORS. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Software Systems (MPI-SWS), Saarbrucken, Germany 2The Alan Turing Institute, London, UK |
| Pseudocode | Yes | Algorithm 1 Online Reward Shaping |
| Open Source Code | Yes | 1Github repo: https://github.com/machine-teaching-group/neurips2022_exploration-guided-reward-shaping. |
| Open Datasets | No | The paper describes custom environments (CHAIN, ROOM, LINEK) but does not provide concrete access information (link, DOI, specific citation) for these datasets to be publicly available. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It describes evaluation during training based on extrinsic rewards. |
| Hardware Specification | No | The paper states "Details are provided in appendices" regarding compute resources, but no specific hardware details are given in the main text. |
| Software Dependencies | No | The paper mentions using "tabular REINFORCE agent [7]" and "tabular Q-learning agent [7]", and a "neural REINFORCE agent", but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We give an overview of main results here, and provide a more detailed description of the setup and additional implementation details in appendices. |