reproducibilityindex.ai

A Boosting Approach to Reinforcement Learning

Authors: Nataly Brukhim, Elad Hazan, Karan Singh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our results, we check if the proposed algorithm is indeed capable of boosting the accuracy of concrete instantiations of weak learners. We evaluated these on the Cart Pole and the Lunar Lander environments. The results demonstrate the proposed RL boosting algorithm succeeds in maximizing rewards while using few weak learners (equivalently, within a few rounds of boosting).
Researcher Affiliation	Collaboration	Nataly Brukhim Princeton University nbrukhim@cs.princeton.edu Elad Hazan Princeton University Google AI Princeton ehazan@cs.princeton.edu Karan Singh Carnegie Mellon University karansingh@cmu.edu
Pseudocode	Yes	Algorithm 1 RL Boosting, Algorithm 2 Internal Boost, and Algorithm 3 Trajectory Sampler are presented in the paper.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	No	We evaluated these on the Cart Pole and the Lunar Lander environments. The paper refers to environments that generate data, not fixed datasets with access information or citations.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It only mentions 'reward is computed over 100 episodes of interactions' which is for evaluation, not data partitioning for model training.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, nor does it mention specific GPU/CPU models or cloud resources.
Software Dependencies	No	The paper mentions 'Scikit-Learn [30]' but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	Throughout all the experiments, we used = 0.9. To speed up computation, the plots below were generated by retaining the 3 most recent policies of every iteration in the policy mixture.