Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adapt to Environment Sudden Changes by Learning a Context Sensitive Policy
Authors: Fan-Ming Luo, Shengyi Jiang, Yang Yu, ZongZhang Zhang, Yi-Feng Zhang7637-7646
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use a grid-world task and 5 locomotion controlling tasks with changing parameters to empirically assess our algorithm. Experiment results show that in environments with both in-distribution and out-of-distribution parameter changes, ESCP can not only better recover the environment encoding, but also adapt more rapidly to the post-change environment (10 faster in the grid-world) while the return performance is kept or improved, compared with state-of-the-art meta RL methods. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Polixir Technologies, Nanjing 210038, China 3Alibaba Group, Hangzhou 310052, China |
| Pseudocode | Yes | Algorithm 1: Training Process of ESCP |
| Open Source Code | Yes | We release our code at Github1. |
| Open Datasets | No | We first conduct experiments on a manually designed grid-world environment. We also made comparisons on 5 Mu Jo Co tasks (Todorov, Erez, and Tassa 2012; Brockman et al. 2016) |
| Dataset Splits | No | At the training phase, we sample 16 environments with different contexts. At the testing phase, we also sample 16 environments make them change suddenly. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or library versions) are explicitly mentioned in the paper. |
| Experiment Setup | No | While the paper describes the components of ESCP and mentions using SAC and Adam optimizer, it does not provide specific numerical hyperparameter values (e.g., learning rate, batch size, values for λ or η) or other detailed training configurations necessary for replication in the main text. |