reproducibilityindex.ai

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

Authors: Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems. and We tested SBEED across multiple continuous control tasks from the Open AI Gym benchmark (Brockman et al., 2016) using the Mu Jo Co simulator (Todorov et al., 2012), including Pendulum-v0, Inverted Double Pendulumv1, Half Cheetah-v1, Swimmer-v1, and Hopper-v1.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology 2Google Inc. 3Microsoft Research 4University of Illinois at Urbana Champaign 5Tencent AI Lab.
Pseudocode	Yes	Algorithm 1 Online SBEED learning with experience replay
Open Source Code	No	No explicit statement or link providing access to the open-source code for the described methodology.
Open Datasets	Yes	We tested SBEED across multiple continuous control tasks from the Open AI Gym benchmark (Brockman et al., 2016) using the Mu Jo Co simulator (Todorov et al., 2012), including Pendulum-v0, Inverted Double Pendulumv1, Half Cheetah-v1, Swimmer-v1, and Hopper-v1.
Dataset Splits	No	The paper uses continuous control tasks from Open AI Gym and Mu Jo Co simulator, which involve agent interaction with an environment to generate data for training. It does not describe explicit train/validation/test dataset splits in terms of percentages or sample counts for a pre-collected static dataset.
Hardware Specification	No	The paper does not provide specific details on the hardware used for running the experiments (e.g., specific GPU/CPU models or cloud instance types).
Software Dependencies	No	The paper mentions 'Open AI Gym' and 'Mu Jo Co simulator' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The goal of our experimental evalution is two folds: (i) to better understand of the effect of each algorithmic component in the proposed algorithm; (ii) to demonstrate the stability and efﬁciency of SBEED in both off-policy and onpolicy settings. Therefore, we conducted an ablation study on SBEED, and a comprehensive comparison to state-ofthe-art reinforcement learning algorithms. While we derive and present SBEED for the single-step Bellman error case, it can be extended to multi-step cases as shown in the long version. In our experiment, we used this multi-step version. and We varied λ and evaluated the performance of SBEED. and The effect of such cancellation is controlled by η [0, 1], and we expected an intermediate value gives the best performance. This is veriﬁed by the experiment of varying η, as shown in Figure 1(b). and We tested the performance of the algorithm with different lookahead lengths (denoted by k).