reproducibilityindex.ai

A Simple Decentralized Cross-Entropy Method

Authors: Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our Decent CEM ﬁnds the global optimum much more consistently thus improves the sample efﬁciency. Furthermore, we plug in our Decent CEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the stateof-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efﬁciency improvement by simply replacing the classical CEM module with our Decent CEM module, while only sacriﬁcing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis.
Researcher Affiliation	Collaboration	Zichen Zhang1, Jun Jin2 Martin Jagersand1 Jun Luo2, Dale Schuurmans1, 1University of Alberta 2Huawei Noah s Ark Lab
Pseudocode	Yes	Theorem 3.1 (Convergence of Decent CEM). If a CEM instance described in Algorithm 3 converges, and we decentralize it by evenly dividing its sample size Nk into M CEM instances which satisﬁes the Assumption 1, then the resulting Decent CEM converges almost surely to the best solution of the individual instances.
Open Source Code	Yes	Code is available at https://github.com/vincentzhang/decent CEM.
Open Datasets	Yes	Environments We run the benchmark in a set of Open AI Gym [Brockman et al., 2016] and Mu Jo Co [Todorov et al., 2012] environments commonly used in the MBRL literature: Pendulum, Inverted Pendulum, Cartpole, Acrobot, Fixed Swimmer4, Reacher, Hopper, Walker2D, Half Cheetah, PETS-Reacher3D, PETS-Half Cheetah, PETS-Pusher, Ant. The three environments preﬁxed by PETS are proposed by Chua et al. [2018].
Dataset Splits	No	The paper describes an evaluation protocol for reinforcement learning environments (e.g., "The learning curve shows the mean and standard error of the test performance out of 5 independent training runs") but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined static dataset splits, which are typically found in supervised learning contexts.
Hardware Specification	Yes	All experiments were conducted using Tesla V100 Volta GPUs.
Software Dependencies	No	The paper mentions software environments like Open AI Gym and MuJoCo but does not provide specific version numbers for any software components or libraries required to reproduce the experiments.
Experiment Setup	Yes	We reuse default hyperparameters for these algorithms from the original papers if not mentioned speciﬁcally. For our proposed methods, we include two variations Decent CEM-A and Decent CEM-P as described in Sec. 4 where the sufﬁx carries the same meaning as in POPLIN-A/P. The ensemble size of Decent CEM-A/P as well as detailed hyperparameters for all algorithms are listed in the Appendix D.2.