Inferring DQN structure for high-dimensional continuous control

Authors: Andrey Sakryukin, Chedy Raissi, Mohan Kankanhalli

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is based on uncertainty estimation techniques and yields substantially higher scores for Mu Jo Co environments with high-dimensional continuous action spaces, as well as a realistic AAA sailing simulator game. First, we estimate the performance of the proposed method on a challenging task of continuous control in Mujoco (Todorov et al., 2012) Physics simulator. We compare performances on four environments: Hopper, Walker2d, Half Cheetah and Humanoid.
Researcher Affiliation Collaboration 1School of Computing, National University of Singapore 2INRIA Nancy Grand Est, France 3Ubisoft, Singapore.
Pseudocode Yes Algorithm 1 Inferring Module Composition Structure Function Infer Sctruct(n number of runs, S): all us [] for i=1; i<n; i++ do games us [] while gamescore<S do Play Step() Train() if Is Episode Finished() then games us.append(Measure Uncertainty()) else end all us.append(games us) end u mean(all us,axis=0) g cluster(u) c sort(g) return c
Open Source Code Yes Our code is available at: https://github.com/ asakryukin/Inferring DQN
Open Datasets Yes First, we estimate the performance of the proposed method on a challenging task of continuous control in Mujoco (Todorov et al., 2012) Physics simulator. We compare performances on four environments: Hopper, Walker2d, Half Cheetah and Humanoid.
Dataset Splits No The paper describes training on simulation environments and evaluates performance based on game scores, but does not specify explicit train/validation/test dataset splits by percentage or sample count, nor does it refer to pre-defined splits with citations.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions software components like 'DDQN model', 'MuJoCo Physics simulator', and 'Re LU activation function', but does not provide specific version numbers for these or other relevant software dependencies, such as deep learning frameworks (e.g., TensorFlow, PyTorch), Python, or MuJoCo itself.
Experiment Setup Yes Network details. In this work we used DDQN model with target network frequency update of 1000. The network main trunk is a 2 layer MLP with 512 and 256 neurons. Action modules were represented by a hidden layer of 128 neurons. All layers use Re LU activation function. The replay buffer size was set to 1 million entries.