reproducibilityindex.ai

Self-Adaptive Double Bootstrapped DDPG

Authors: Zhuobin Zheng, Chun Yuan, Zhihui Lin, Yangyang Cheng, Hanghao Wu

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate and demonstrate the effectiveness and efﬁciency of SOUP on Open AI Gym s [Brockman et al., 2016] Mu Jo Co continuous control environments, Hopper and Walker2D [Todorov et al., 2012]. and We conduct extensive experiments to evaluate the performance of our approach from different perspectives, including comparisons on bootstrapped models, conﬁdence strategies and multiple heads.
Researcher Affiliation	Academia	Zhuobin Zheng12, Chun Yuan2, Zhihui Lin12, Yangyang Cheng12, Hanghao Wu12 1 Department of Computer Science and Technologies, Tsinghua University 2 Graduate School at Shenzhen, Tsinghua University
Pseudocode	Yes	Algorithm 1 Self-Adaptive Double Bootstrapped DDPG
Open Source Code	No	The paper does not provide an explicit statement about the release of open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	Yes	We evaluate our algorithm on following continuous robotic environments implemented in Mu Jo Co simulator [Todorov et al., 2012] from Open AI Gym [Brockman et al., 2016] (see Figure 3 for a visualization). Hopper-v1 In this environment, a two-dimensional onelegged robot is rewarded by hopping forward as fast as possible (S R11, A R3). Walker2d-v1 This environment extends Hopper to a bipedal robot in 2D-space, rewarded by walking forward as fast as possible (S R17, A R6).
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits with percentages or sample counts, as is typical for static datasets. Instead, it details performance measurement over 10k episodes in continuous environments, which is a different evaluation paradigm.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., GPU model, CPU type, memory specifications).
Software Dependencies	No	The paper mentions software components like 'Adam', 'Leaky ReLU', 'OpenAI Gym', and 'Mu Jo Co simulator' but does not specify their version numbers, which are required for full reproducibility.
Experiment Setup	Yes	To ensure comparability, unless otherwise stated, we keep the common hyperparameters and the network architecture the same in the experiments. We denote the hidden layer sizes as (N, M) where the bold number indicates the head layer size. For double bootstrapped DDPG, we use (256, 256, 128) for the critic and (256, 128) for the actor. Adam [Kingma and Ba, 2015] is adopted for training actor and critic networks with a learning rate of 1e 4 and 3e 4 respectively. We use a discount factor γ = 0.99, a soft update rate τ = 1e 3, a minibatch size n = 1024 and a replay memory size R = 1e6. All activation layers use Leaky Re LU [Maas et al., 2013] and the output layer of the actor uses Tan H followed by the scale and shift operations.