Deep Reinforcement Learning amidst Continual Structured Non-Stationarity

Authors: Annie Xie, James Harrison, Chelsea Finn

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experimental evaluation, we find that our method far outperforms RL algorithms that do not account for environment non-stationary, handles extrapolating environment shifts, and retains strong performance in stationary settings.
Researcher Affiliation Academia Annie Xie 1 James Harrison 1 Chelsea Finn 1 1Stanford University. Correspondence to: Annie Xie <anniexie@stanford.edu>.
Pseudocode Yes Algorithm 1 Lifelong Latent Actor-Critic (LILAC)
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in this paper.
Open Datasets Yes The first is derived from the simulated Sawyer reaching task in the Meta-World benchmark (Yu et al., 2019), in which the target position is not observed and moves between episodes. In the second environment based on Half Cheetah from Open AI Gym (Brockman et al., 2016)... We next consider the 8-Do F minitaur environment (Tan et al., 2018)...
Dataset Splits No No specific details on training, validation, and test dataset splits are provided (e.g., percentages or sample counts).
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are provided for the experimental setup.
Software Dependencies No The paper mentions software components like Open AI Gym and Meta-World but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We tune the hyperparameters for all approaches, and run each with the best hyperparameter setting with 3 random seeds. For all hyperparameter details, see Appendix B.