Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
Authors: Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham Kakade, Sergey Levine
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We confirm that these results hold in practice in an experimental evaluation, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance. and 7 Numerical Simulations To show the practical relevance of our analysis on reward shaping we perform some numerical simulations on a family of maze environments with tabular state-action representations, as shown in Fig. 3. |
| Researcher Affiliation | Collaboration | Abhishek Gupta University of Washington EMAIL Aldo Pacchiano Microsoft Research, NYC EMAIL Yuexiang Zhai UC Berkeley, EECS EMAIL Sham M. Kakade Harvard University EMAIL Sergey Levine UC Berkeley, EECS EMAIL |
| Pseudocode | Yes | Algorithm 1 UCBVI Shaped and Algorithm 2 Value Iteration with Projection |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Appendix A for link to URL and run instructions in the README in the github repo. |
| Open Datasets | Yes | We used three open source domains and collected our own data on these domains. |
| Dataset Splits | No | The paper describes the experimental setup and numerical simulations but does not explicitly detail training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | We perform our experiments on a linux machine running Ubuntu 20.04.2 LTS with 64GB of RAM and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. |
| Software Dependencies | Yes | The code is written in Python 3.8.10, and uses standard libraries such as numpy 1.21.6 and matplotlib 3.5.2. |
| Experiment Setup | Yes | e V is constructed by scaling the optimal value function V ? by per-state scaling factors sampled independently within the range b. ... with various levels of imperfect shaping applied by varying b = {1.5,1.9}. |