reproducibilityindex.ai

Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy

Authors: Fan-Ming Luo, Shengyi Jiang, Yang Yu, ZongZhang Zhang, Yi-Feng Zhang7637-7646

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We use a grid-world task and 5 locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment (10 faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods.
Researcher Affiliation	Collaboration	1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Polixir Technologies, Nanjing 210038, China 3Alibaba Group, Hangzhou 310052, China
Pseudocode	Yes	Algorithm 1: Training Process of ESCP
Open Source Code	Yes	We release our code at Github1.
Open Datasets	No	We first conduct experiments on a manually designed grid-world environment. We also made comparisons on 5 Mu Jo Co tasks (Todorov, Erez, and Tassa 2012; Brockman et al. 2016)
Dataset Splits	No	At the training phase, we sample 16 environments with different contexts. At the testing phase, we also sample 16 environments make them change suddenly.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions) are explicitly mentioned in the paper.
Experiment Setup	No	While the paper describes the components of ESCP and mentions using SAC and Adam optimizer, it does not provide specific numerical hyperparameter values (e.g., learning rate, batch size, values for λ or η) or other detailed training configurations necessary for replication in the main text.