Self-Paced Deep Reinforcement Learning

Authors: Pascal Klink, Carlo D'Eramo, Jan R. Peters, Joni Pajarinen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the conducted experiments, the curricula generated with the proposed algorithm significantly improve learning performance across several environments and deep RL algorithms, matching or outperforming state-of-the-art existing CRL algorithms.
Researcher Affiliation Academia Pascal Klink1, Carlo D Eramo1, Jan Peters1, Joni Pajarinen1,2 1 Intelligent Autonomous Systems, Technische Universität Darmstadt, Germany 2 Department of Electrical Engineering and Automation, Aalto University, Finland
Pseudocode Yes Algorithm 1 Self-Paced Deep Reinforcement Learning
Open Source Code Yes Code for running the experiments can be found at https://github.com/psclklnk/spdl
Open Datasets Yes We use the Open AI Gym simulation environment [53]... We use the Nvidia Isaac Gym simulator [54] for this experiment.
Dataset Splits No The paper describes training and evaluation in continuous environments but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions 'on our hardware' but does not provide specific details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions software like 'Stable Baselines library' and 'SciPy library' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We evaluate the performance using TRPO [16], PPO [17] and SAC [18]. For all DRL algorithms, we use the implementations provided in the Stable Baselines library [52]. ... In each iteration, the parameter αi is chosen such that the KL divergence penalty w.r.t. the current context distribution is in constant proportion ζ to the average reward obtained during the last iteration of policy optimization αi = B(νi, Di) = ζ (1/K PK k=1 R τ k, ck / DKL (pνi(c) µ(c))) ... For the experiments, we restrict pν(c) to be Gaussian.