reproducibilityindex.ai

Proximal Distilled Evolutionary Reinforcement Learning

Authors: Cristian Bodnar, Ben Day, Pietro Lió3283-3290

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PDERL in ﬁve robot locomotion settings from the Open AI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments. This section evaluates the performance of the proposed methods, and also takes a closer look at the behaviour of the proposed operators. ... Figure 4 shows the mean reward and the standard deviation obtained by all algorithms on ﬁve Mu Jo Co (Todorov, Erez, and Tassa 2012) environments.
Researcher Affiliation	Academia	Cristian Bodnar, Ben Day, Pietro Li o Department of Computer Science & Technology University of Cambridge Cambridge, United Kingdom cb2015@cam.ac.uk
Pseudocode	Yes	Algorithm 1: Distillation Crossover; Algorithm 2: Proximal Mutation
Open Source Code	Yes	Our code is publicly available at https://github.com/crisbodnar/pderl.
Open Datasets	Yes	We evaluate PDERL in ﬁve robot locomotion settings from the Open AI gym. ... Figure 4 shows the mean reward and the standard deviation obtained by all algorithms on ﬁve Mu Jo Co (Todorov, Erez, and Tassa 2012) environments.
Dataset Splits	No	The paper evaluates agents in continuous environments and discusses hyperparameter tuning, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts, which is typical for supervised learning tasks. Instead, it refers to 'environment frames experienced' and 'evaluation rounds'.
Hardware Specification	No	The paper mentions 'limited computational resources' but does not provide any specific details about the hardware used for experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software components like 'Adam optimiser' and 'Open AI gym' but does not provide specific version numbers for these or any other software dependencies, which would be necessary for reproducible replication.
Experiment Setup	Yes	For the PDERL speciﬁc hyperparameters, we performed little tuning due to the limited computational resources. In what follows we report the chosen values alongside the values that were considered. The crossover and mutation batch sizes are NC = 128 and NM = 256 (searched over 64, 128, 256). The genetic memory has a capacity of κ = 8k transitions (2k, 4k, 8k, 10k). The learning rate for the distillation crossover is 10 3 (10 2, 10 3, 10 4, 10 5), and the child policy is trained for 12 epochs (4, 8, 12, 16) . All the training procedures use the Adam optimiser.