Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Authors: Fan-Ming Luo, Shengyi Jiang, Yang Yu, ZongZhang Zhang, Yi-Feng Zhang7637-7646

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use a grid-world task and 5 locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment (10 faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Polixir Technologies, Nanjing 210038, China 3Alibaba Group, Hangzhou 310052, China
Pseudocode Yes Algorithm 1: Training Process of ESCP
Open Source Code Yes We release our code at Github1.
Open Datasets No We first conduct experiments on a manually designed grid-world environment. We also made comparisons on 5 Mu Jo Co tasks (Todorov, Erez, and Tassa 2012; Brockman et al. 2016)
Dataset Splits No At the training phase, we sample 16 environments with different contexts. At the testing phase, we also sample 16 environments make them change suddenly.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions) are explicitly mentioned in the paper.
Experiment Setup No While the paper describes the components of ESCP and mentions using SAC and Adam optimizer, it does not provide specific numerical hyperparameter values (e.g., learning rate, batch size, values for λ or η) or other detailed training configurations necessary for replication in the main text.