reproducibilityindex.ai

Self-Paced Deep Reinforcement Learning

Authors: Pascal Klink, Carlo D'Eramo, Jan R. Peters, Joni Pajarinen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the conducted experiments, the curricula generated with the proposed algorithm signiﬁcantly improve learning performance across several environments and deep RL algorithms, matching or outperforming state-of-the-art existing CRL algorithms.
Researcher Affiliation	Academia	Pascal Klink1, Carlo D Eramo1, Jan Peters1, Joni Pajarinen1,2 1 Intelligent Autonomous Systems, Technische Universität Darmstadt, Germany 2 Department of Electrical Engineering and Automation, Aalto University, Finland
Pseudocode	Yes	Algorithm 1 Self-Paced Deep Reinforcement Learning
Open Source Code	Yes	Code for running the experiments can be found at https://github.com/psclklnk/spdl
Open Datasets	Yes	We use the Open AI Gym simulation environment [53]... We use the Nvidia Isaac Gym simulator [54] for this experiment.
Dataset Splits	No	The paper describes training and evaluation in continuous environments but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper mentions 'on our hardware' but does not provide specific details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions software like 'Stable Baselines library' and 'SciPy library' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We evaluate the performance using TRPO [16], PPO [17] and SAC [18]. For all DRL algorithms, we use the implementations provided in the Stable Baselines library [52]. ... In each iteration, the parameter αi is chosen such that the KL divergence penalty w.r.t. the current context distribution is in constant proportion ζ to the average reward obtained during the last iteration of policy optimization αi = B(νi, Di) = ζ (1/K PK k=1 R τ k, ck / DKL (pνi(c) µ(c))) ... For the experiments, we restrict pν(c) to be Gaussian.