Proximal Distilled Evolutionary Reinforcement Learning

Authors: Cristian Bodnar, Ben Day, Pietro Lió3283-3290

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate PDERL in five robot locomotion settings from the Open AI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments. This section evaluates the performance of the proposed methods, and also takes a closer look at the behaviour of the proposed operators. ... Figure 4 shows the mean reward and the standard deviation obtained by all algorithms on five Mu Jo Co (Todorov, Erez, and Tassa 2012) environments.
Researcher Affiliation Academia Cristian Bodnar, Ben Day, Pietro Li o Department of Computer Science & Technology University of Cambridge Cambridge, United Kingdom cb2015@cam.ac.uk
Pseudocode Yes Algorithm 1: Distillation Crossover; Algorithm 2: Proximal Mutation
Open Source Code Yes Our code is publicly available at https://github.com/crisbodnar/pderl.
Open Datasets Yes We evaluate PDERL in five robot locomotion settings from the Open AI gym. ... Figure 4 shows the mean reward and the standard deviation obtained by all algorithms on five Mu Jo Co (Todorov, Erez, and Tassa 2012) environments.
Dataset Splits No The paper evaluates agents in continuous environments and discusses hyperparameter tuning, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts, which is typical for supervised learning tasks. Instead, it refers to 'environment frames experienced' and 'evaluation rounds'.
Hardware Specification No The paper mentions 'limited computational resources' but does not provide any specific details about the hardware used for experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions software components like 'Adam optimiser' and 'Open AI gym' but does not provide specific version numbers for these or any other software dependencies, which would be necessary for reproducible replication.
Experiment Setup Yes For the PDERL specific hyperparameters, we performed little tuning due to the limited computational resources. In what follows we report the chosen values alongside the values that were considered. The crossover and mutation batch sizes are NC = 128 and NM = 256 (searched over 64, 128, 256). The genetic memory has a capacity of κ = 8k transitions (2k, 4k, 8k, 10k). The learning rate for the distillation crossover is 10 3 (10 2, 10 3, 10 4, 10 5), and the child policy is trained for 12 epochs (4, 8, 12, 16) . All the training procedures use the Adam optimiser.