Causal Influence Detection for Improving Efficiency in Reinforcement Learning
Authors: Maximilian Seitzer, Bernhard Schölkopf, Georg Martius
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All modified algorithms show strong increases in data efficiency on robotic manipulation tasks. ... Each of our investigations is backed by empirical evaluations in robotic manipulation environments and demonstrates a clear improvement of the state-of-the-art with the same generic influence measure. |
| Researcher Affiliation | Academia | Maximilian Seitzer MPI for Intelligent Systems Tübingen, Germany maximilian.seitzer@tue.mpg.de Bernhard Schölkopf MPI for Intelligent Systems Tübingen, Germany bs@tue.mpg.de Georg Martius MPI for Intelligent Systems Tübingen, Germany georg.martius@tue.mpg.de |
| Pseudocode | No | The paper describes algorithms and methods but does not contain any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a link to source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | Furthermore, we test on the FETCHPICKANDPLACE environment from Open AI Gym [53]... We consider the environments FETCHPUSH, FETCHPICKANDPLACE from Open AI Gym [55], and FETCHROTTABLE which is our modification containing a rotating table (explained in Suppl. B.3). |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, or test splits. It mentions '5 random seeds' for testing results but not data splitting. |
| Hardware Specification | No | The paper does not specify any hardware components (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Open AI Gym' environments, 'DDPG [56] with hindsight experience replay (HER) [57]' as the base RL algorithm, but does not provide specific version numbers for software libraries or dependencies. |
| Experiment Setup | Yes | For our method, we use CAI estimated according to Eq. 4 (with K = 64)... For every exploratory action (ϵ is 30% in our experiments)... In the figure, CAI uses 100% active exploration and λbonus = 0.2 as the bonus reward scale. We use DDPG [56] with hindsight experience replay (HER) [57] as the base RL algorithm, a combination that achieves state-of-the-art results in these environment. More information about the experimental settings can be found in Suppl. F. |