reproducibilityindex.ai

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Authors: Abhishek Gupta, Aldo Pacchiano, Yuexiang Zhai, Sham Kakade, Sergey Levine

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We confirm that these results hold in practice in an experimental evaluation, providing an insight into the mechanisms through which reward shaping can significantly improve the complexity of reinforcement learning while retaining asymptotic performance. and 7 Numerical Simulations To show the practical relevance of our analysis on reward shaping we perform some numerical simulations on a family of maze environments with tabular state-action representations, as shown in Fig. 3.
Researcher Affiliation	Collaboration	Abhishek Gupta University of Washington abhgupta@cs.washington.edu Aldo Pacchiano Microsoft Research, NYC apacchiano@microsoft.com Yuexiang Zhai UC Berkeley, EECS simonzhai@berkeley.edu Sham M. Kakade Harvard University sham@seas.harvard.edu Sergey Levine UC Berkeley, EECS svlevine@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 UCBVI Shaped and Algorithm 2 Value Iteration with Projection
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Appendix A for link to URL and run instructions in the README in the github repo.
Open Datasets	Yes	We used three open source domains and collected our own data on these domains.
Dataset Splits	No	The paper describes the experimental setup and numerical simulations but does not explicitly detail training, validation, and test dataset splits with percentages or counts.
Hardware Specification	Yes	We perform our experiments on a linux machine running Ubuntu 20.04.2 LTS with 64GB of RAM and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
Software Dependencies	Yes	The code is written in Python 3.8.10, and uses standard libraries such as numpy 1.21.6 and matplotlib 3.5.2.
Experiment Setup	Yes	e V is constructed by scaling the optimal value function V ? by per-state scaling factors sampled independently within the range b. ... with various levels of imperfect shaping applied by varying b = {1.5,1.9}.