Self-Adaptive Double Bootstrapped DDPG
Authors: Zhuobin Zheng, Chun Yuan, Zhihui Lin, Yangyang Cheng, Hanghao Wu
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate and demonstrate the effectiveness and efficiency of SOUP on Open AI Gym s [Brockman et al., 2016] Mu Jo Co continuous control environments, Hopper and Walker2D [Todorov et al., 2012]. and We conduct extensive experiments to evaluate the performance of our approach from different perspectives, including comparisons on bootstrapped models, confidence strategies and multiple heads. |
| Researcher Affiliation | Academia | Zhuobin Zheng12, Chun Yuan2, Zhihui Lin12, Yangyang Cheng12, Hanghao Wu12 1 Department of Computer Science and Technologies, Tsinghua University 2 Graduate School at Shenzhen, Tsinghua University |
| Pseudocode | Yes | Algorithm 1 Self-Adaptive Double Bootstrapped DDPG |
| Open Source Code | No | The paper does not provide an explicit statement about the release of open-source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | We evaluate our algorithm on following continuous robotic environments implemented in Mu Jo Co simulator [Todorov et al., 2012] from Open AI Gym [Brockman et al., 2016] (see Figure 3 for a visualization). Hopper-v1 In this environment, a two-dimensional onelegged robot is rewarded by hopping forward as fast as possible (S R11, A R3). Walker2d-v1 This environment extends Hopper to a bipedal robot in 2D-space, rewarded by walking forward as fast as possible (S R17, A R6). |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits with percentages or sample counts, as is typical for static datasets. Instead, it details performance measurement over 10k episodes in continuous environments, which is a different evaluation paradigm. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU model, CPU type, memory specifications). |
| Software Dependencies | No | The paper mentions software components like 'Adam', 'Leaky ReLU', 'OpenAI Gym', and 'Mu Jo Co simulator' but does not specify their version numbers, which are required for full reproducibility. |
| Experiment Setup | Yes | To ensure comparability, unless otherwise stated, we keep the common hyperparameters and the network architecture the same in the experiments. We denote the hidden layer sizes as (N, M) where the bold number indicates the head layer size. For double bootstrapped DDPG, we use (256, 256, 128) for the critic and (256, 128) for the actor. Adam [Kingma and Ba, 2015] is adopted for training actor and critic networks with a learning rate of 1e 4 and 3e 4 respectively. We use a discount factor γ = 0.99, a soft update rate τ = 1e 3, a minibatch size n = 1024 and a replay memory size R = 1e6. All activation layers use Leaky Re LU [Maas et al., 2013] and the output layer of the actor uses Tan H followed by the scale and shift operations. |