OFFER: Off-Environment Reinforcement Learning
Authors: Kamil Ciosek, Shimon Whiteson
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically compare OFFER to a policy gradient baseline (primary optimisation only) on variants of the mountain car task, as well as a simulated robot arm. |
| Researcher Affiliation | Academia | Kamil Ciosek, Shimon Whiteson Department of Computer Science, University of Oxford, United Kingdom {kamil.ciosek,shimon.whiteson}@cs.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 CRITIC-REINFORCE(τ) ... Algorithm 5 SECONDARY-OPTIMISATION(τ, ψ) |
| Open Source Code | No | The paper does not provide any statement or link indicating the release of open-source code for the described methodology. |
| Open Datasets | No | The paper mentions the 'mountain car benchmark task' and a 'simulated robotic-arm control task' proposed by Paul et al. (2016) but does not provide concrete access information (link, DOI, repository, or clear citation to a publicly available dataset) for the specific datasets or environments used in their modified experiments. |
| Dataset Splits | No | The paper does not explicitly state specific training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cluster specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions algorithms like ADAM but does not provide specific software names with version numbers for reproducibility (e.g., 'Python 3.x', 'PyTorch 1.x'). |
| Experiment Setup | No | The paper states 'complete details are in the supplementary material' for the experimental setup, and the main text only provides general descriptions like 'θ consists of the parameters of a standard tile-coding function approximator' and 'the agent moves each joint by at most 30% of the allowed movement range', without specific hyperparameter values or comprehensive system-level training settings. |