The Differentiable Cross-Entropy Method
Authors: Brandon Amos, Denis Yarats
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate applications of the crossentropy method in structured prediction, control, and reinforcement learning. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research 2New York University. |
| Pseudocode | Yes | Algorithm 1 DCEM(f , gφ, φ1; , N, k, T) and Algorithm 2 Learning an embedded control space with DCEM |
| Open Source Code | Yes | Our Py Torch (Paszke et al., 2019) source code is openly available at github.com/facebookresearch/dcem and uses the Py Torch LML implementation from github.com/locuslab/lml to compute eq. (4). |
| Open Datasets | Yes | We use the standard cartpole dynamical system from Barto et al. (1983) with a continuous state-action space. cheetah.run and walker.walk continuous locomotion tasks from the Deep Mind control suite (Tassa et al., 2018) using the Mu Jo Co physics engine (Todorov et al., 2012). |
| Dataset Splits | No | The paper mentions training on "2M timesteps" and evaluating on "100 test episodes," but it does not specify explicit percentages or counts for training, validation, and test splits needed for exact data partitioning reproducibility. |
| Hardware Specification | No | The paper mentions running computations that are "GPU-amenable" but does not specify any particular models of GPUs, CPUs, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions using PyTorch, the DeepMind control suite, MuJoCo, and PPO, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For illustrative purposes we consider a simple unidimensional regression task... Both of these are trained to take 10 optimizer steps and we use an inner learning rate of 0.1 for gradient descent and with DCEM we use 10 iterations with 100 samples per iteration and 10 elite candidates, with a temperature of 1. For DCEM over the embedded space we use 10 iterations with 100 samples in each iteration and 10 elite candidates, again with a temperature of 1. |