Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy
Authors: Fan-Ming Luo, Shengyi Jiang, Yang Yu, ZongZhang Zhang, Yi-Feng Zhang7637-7646
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use a grid-world task and 5 locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment (10 faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Polixir Technologies, Nanjing 210038, China 3Alibaba Group, Hangzhou 310052, China |
| Pseudocode | Yes | Algorithm 1: Training Process of ESCP |
| Open Source Code | Yes | We release our code at Github1. |
| Open Datasets | No | We first conduct experiments on a manually designed grid-world environment. We also made comparisons on 5 Mu Jo Co tasks (Todorov, Erez, and Tassa 2012; Brockman et al. 2016) |
| Dataset Splits | No | At the training phase, we sample 16 environments with different contexts. At the testing phase, we also sample 16 environments make them change suddenly. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions) are explicitly mentioned in the paper. |
| Experiment Setup | No | While the paper describes the components of ESCP and mentions using SAC and Adam optimizer, it does not provide specific numerical hyperparameter values (e.g., learning rate, batch size, values for λ or η) or other detailed training configurations necessary for replication in the main text. |