reproducibilityindex.ai

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Authors: Maximilian Seitzer, Bernhard Schölkopf, Georg Martius

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	All modiﬁed algorithms show strong increases in data efﬁciency on robotic manipulation tasks. ... Each of our investigations is backed by empirical evaluations in robotic manipulation environments and demonstrates a clear improvement of the state-of-the-art with the same generic inﬂuence measure.
Researcher Affiliation	Academia	Maximilian Seitzer MPI for Intelligent Systems Tübingen, Germany maximilian.seitzer@tue.mpg.de Bernhard Schölkopf MPI for Intelligent Systems Tübingen, Germany bs@tue.mpg.de Georg Martius MPI for Intelligent Systems Tübingen, Germany georg.martius@tue.mpg.de
Pseudocode	No	The paper describes algorithms and methods but does not contain any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide a link to source code or explicitly state that the code is publicly available.
Open Datasets	Yes	Furthermore, we test on the FETCHPICKANDPLACE environment from Open AI Gym [53]... We consider the environments FETCHPUSH, FETCHPICKANDPLACE from Open AI Gym [55], and FETCHROTTABLE which is our modiﬁcation containing a rotating table (explained in Suppl. B.3).
Dataset Splits	No	The paper does not provide specific percentages or counts for training, validation, or test splits. It mentions '5 random seeds' for testing results but not data splitting.
Hardware Specification	No	The paper does not specify any hardware components (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Open AI Gym' environments, 'DDPG [56] with hindsight experience replay (HER) [57]' as the base RL algorithm, but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup	Yes	For our method, we use CAI estimated according to Eq. 4 (with K = 64)... For every exploratory action (ϵ is 30% in our experiments)... In the ﬁgure, CAI uses 100% active exploration and λbonus = 0.2 as the bonus reward scale. We use DDPG [56] with hindsight experience replay (HER) [57] as the base RL algorithm, a combination that achieves state-of-the-art results in these environment. More information about the experimental settings can be found in Suppl. F.