A Simple Decentralized Cross-Entropy Method

Authors: Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our Decent CEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our Decent CEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the stateof-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our Decent CEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis.
Researcher Affiliation Collaboration Zichen Zhang1, Jun Jin2 Martin Jagersand1 Jun Luo2, Dale Schuurmans1, 1University of Alberta 2Huawei Noah s Ark Lab
Pseudocode Yes Theorem 3.1 (Convergence of Decent CEM). If a CEM instance described in Algorithm 3 converges, and we decentralize it by evenly dividing its sample size Nk into M CEM instances which satisfies the Assumption 1, then the resulting Decent CEM converges almost surely to the best solution of the individual instances.
Open Source Code Yes Code is available at https://github.com/vincentzhang/decent CEM.
Open Datasets Yes Environments We run the benchmark in a set of Open AI Gym [Brockman et al., 2016] and Mu Jo Co [Todorov et al., 2012] environments commonly used in the MBRL literature: Pendulum, Inverted Pendulum, Cartpole, Acrobot, Fixed Swimmer4, Reacher, Hopper, Walker2D, Half Cheetah, PETS-Reacher3D, PETS-Half Cheetah, PETS-Pusher, Ant. The three environments prefixed by PETS are proposed by Chua et al. [2018].
Dataset Splits No The paper describes an evaluation protocol for reinforcement learning environments (e.g., "The learning curve shows the mean and standard error of the test performance out of 5 independent training runs") but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined static dataset splits, which are typically found in supervised learning contexts.
Hardware Specification Yes All experiments were conducted using Tesla V100 Volta GPUs.
Software Dependencies No The paper mentions software environments like Open AI Gym and MuJoCo but does not provide specific version numbers for any software components or libraries required to reproduce the experiments.
Experiment Setup Yes We reuse default hyperparameters for these algorithms from the original papers if not mentioned specifically. For our proposed methods, we include two variations Decent CEM-A and Decent CEM-P as described in Sec. 4 where the suffix carries the same meaning as in POPLIN-A/P. The ensemble size of Decent CEM-A/P as well as detailed hyperparameters for all algorithms are listed in the Appendix D.2.