Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity
Authors: Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham Kakade, Sergey Levine
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We confirm that these results hold in practice in an experimental evaluation, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance. and 7 Numerical Simulations To show the practical relevance of our analysis on reward shaping we perform some numerical simulations on a family of maze environments with tabular state-action representations, as shown in Fig. 3. |
| Researcher Affiliation | Collaboration | Abhishek Gupta University of Washington abhgupta@cs.washington.edu Aldo Pacchiano Microsoft Research, NYC apacchiano@microsoft.com Yuexiang Zhai UC Berkeley, EECS simonzhai@berkeley.edu Sham M. Kakade Harvard University sham@seas.harvard.edu Sergey Levine UC Berkeley, EECS svlevine@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 UCBVI Shaped and Algorithm 2 Value Iteration with Projection |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Appendix A for link to URL and run instructions in the README in the github repo. |
| Open Datasets | Yes | We used three open source domains and collected our own data on these domains. |
| Dataset Splits | No | The paper describes the experimental setup and numerical simulations but does not explicitly detail training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | We perform our experiments on a linux machine running Ubuntu 20.04.2 LTS with 64GB of RAM and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. |
| Software Dependencies | Yes | The code is written in Python 3.8.10, and uses standard libraries such as numpy 1.21.6 and matplotlib 3.5.2. |
| Experiment Setup | Yes | e V is constructed by scaling the optimal value function V ? by per-state scaling factors sampled independently within the range b. ... with various levels of imperfect shaping applied by varying b = {1.5,1.9}. |