Inferring DQN structure for high-dimensional continuous control
Authors: Andrey Sakryukin, Chedy Raissi, Mohan Kankanhalli
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is based on uncertainty estimation techniques and yields substantially higher scores for Mu Jo Co environments with high-dimensional continuous action spaces, as well as a realistic AAA sailing simulator game. First, we estimate the performance of the proposed method on a challenging task of continuous control in Mujoco (Todorov et al., 2012) Physics simulator. We compare performances on four environments: Hopper, Walker2d, Half Cheetah and Humanoid. |
| Researcher Affiliation | Collaboration | 1School of Computing, National University of Singapore 2INRIA Nancy Grand Est, France 3Ubisoft, Singapore. |
| Pseudocode | Yes | Algorithm 1 Inferring Module Composition Structure Function Infer Sctruct(n number of runs, S): all us [] for i=1; i<n; i++ do games us [] while gamescore<S do Play Step() Train() if Is Episode Finished() then games us.append(Measure Uncertainty()) else end all us.append(games us) end u mean(all us,axis=0) g cluster(u) c sort(g) return c |
| Open Source Code | Yes | Our code is available at: https://github.com/ asakryukin/Inferring DQN |
| Open Datasets | Yes | First, we estimate the performance of the proposed method on a challenging task of continuous control in Mujoco (Todorov et al., 2012) Physics simulator. We compare performances on four environments: Hopper, Walker2d, Half Cheetah and Humanoid. |
| Dataset Splits | No | The paper describes training on simulation environments and evaluates performance based on game scores, but does not specify explicit train/validation/test dataset splits by percentage or sample count, nor does it refer to pre-defined splits with citations. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software components like 'DDQN model', 'MuJoCo Physics simulator', and 'Re LU activation function', but does not provide specific version numbers for these or other relevant software dependencies, such as deep learning frameworks (e.g., TensorFlow, PyTorch), Python, or MuJoCo itself. |
| Experiment Setup | Yes | Network details. In this work we used DDQN model with target network frequency update of 1000. The network main trunk is a 2 layer MLP with 512 and 256 neurons. Action modules were represented by a hidden layer of 128 neurons. All layers use Re LU activation function. The replay buffer size was set to 1 million entries. |