Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Authors: Maximilian Seitzer, Bernhard Schölkopf, Georg Martius

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All modified algorithms show strong increases in data efficiency on robotic manipulation tasks. ... Each of our investigations is backed by empirical evaluations in robotic manipulation environments and demonstrates a clear improvement of the state-of-the-art with the same generic influence measure.
Researcher Affiliation Academia Maximilian Seitzer MPI for Intelligent Systems Tübingen, Germany maximilian.seitzer@tue.mpg.de Bernhard Schölkopf MPI for Intelligent Systems Tübingen, Germany bs@tue.mpg.de Georg Martius MPI for Intelligent Systems Tübingen, Germany georg.martius@tue.mpg.de
Pseudocode No The paper describes algorithms and methods but does not contain any formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a link to source code or explicitly state that the code is publicly available.
Open Datasets Yes Furthermore, we test on the FETCHPICKANDPLACE environment from Open AI Gym [53]... We consider the environments FETCHPUSH, FETCHPICKANDPLACE from Open AI Gym [55], and FETCHROTTABLE which is our modification containing a rotating table (explained in Suppl. B.3).
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, or test splits. It mentions '5 random seeds' for testing results but not data splitting.
Hardware Specification No The paper does not specify any hardware components (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Open AI Gym' environments, 'DDPG [56] with hindsight experience replay (HER) [57]' as the base RL algorithm, but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes For our method, we use CAI estimated according to Eq. 4 (with K = 64)... For every exploratory action (ϵ is 30% in our experiments)... In the figure, CAI uses 100% active exploration and λbonus = 0.2 as the bonus reward scale. We use DDPG [56] with hindsight experience replay (HER) [57] as the base RL algorithm, a combination that achieves state-of-the-art results in these environment. More information about the experimental settings can be found in Suppl. F.