A Simple Decentralized Cross-Entropy Method
Authors: Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our Decent CEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our Decent CEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the stateof-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our Decent CEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis. |
| Researcher Affiliation | Collaboration | Zichen Zhang1, Jun Jin2 Martin Jagersand1 Jun Luo2, Dale Schuurmans1, 1University of Alberta 2Huawei Noah s Ark Lab |
| Pseudocode | Yes | Theorem 3.1 (Convergence of Decent CEM). If a CEM instance described in Algorithm 3 converges, and we decentralize it by evenly dividing its sample size Nk into M CEM instances which satisfies the Assumption 1, then the resulting Decent CEM converges almost surely to the best solution of the individual instances. |
| Open Source Code | Yes | Code is available at https://github.com/vincentzhang/decent CEM. |
| Open Datasets | Yes | Environments We run the benchmark in a set of Open AI Gym [Brockman et al., 2016] and Mu Jo Co [Todorov et al., 2012] environments commonly used in the MBRL literature: Pendulum, Inverted Pendulum, Cartpole, Acrobot, Fixed Swimmer4, Reacher, Hopper, Walker2D, Half Cheetah, PETS-Reacher3D, PETS-Half Cheetah, PETS-Pusher, Ant. The three environments prefixed by PETS are proposed by Chua et al. [2018]. |
| Dataset Splits | No | The paper describes an evaluation protocol for reinforcement learning environments (e.g., "The learning curve shows the mean and standard error of the test performance out of 5 independent training runs") but does not provide specific train/validation/test dataset splits with percentages, sample counts, or citations to predefined static dataset splits, which are typically found in supervised learning contexts. |
| Hardware Specification | Yes | All experiments were conducted using Tesla V100 Volta GPUs. |
| Software Dependencies | No | The paper mentions software environments like Open AI Gym and MuJoCo but does not provide specific version numbers for any software components or libraries required to reproduce the experiments. |
| Experiment Setup | Yes | We reuse default hyperparameters for these algorithms from the original papers if not mentioned specifically. For our proposed methods, we include two variations Decent CEM-A and Decent CEM-P as described in Sec. 4 where the suffix carries the same meaning as in POPLIN-A/P. The ensemble size of Decent CEM-A/P as well as detailed hyperparameters for all algorithms are listed in the Appendix D.2. |