Meta-Reinforcement Learning by Tracking Task Non-stationarity

Authors: Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines. Our experiments aim at addressing the following questions: Does TRIO successfully track and anticipate changes in the latent variables governing the problem? How does it perform under different non-stationarities? What is the advantage w.r.t. methods that neglect the non-stationarity? How better an oracle that knows the task evolution process can be?
Researcher Affiliation Academia Riccardo Poiani1 , Andrea Tirinzoni2 and Marcello Restelli1 1Politecnico di Milano 2Inria Lille
Pseudocode Yes the pseudo-code of TRIO can be found in Algorithm 1 and 2. Algorithm 1 TRIO (meta-training) Algorithm 2 TRIO (meta-testing)
Open Source Code No The paper mentions "An extended version of the paper with appendix is available on arXiv." but does not provide any statement or link regarding the availability of its source code.
Open Datasets No The paper uses simulated environments (Minigolf, Mu Jo Co benchmarks like Half Cheetah Vel and Ant Goal) for experiments. While these environments are well-known in RL literature, the paper does not provide direct links, DOIs, specific repository names, or formal citations for publicly available datasets or the specific configurations of these simulated environments.
Dataset Splits No The paper discusses meta-training and meta-testing phases but does not provide specific details on validation dataset splits, such as percentages, sample counts, or methodology for creating such splits.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cluster specifications).
Software Dependencies No The paper mentions using "proximal policy optimization (PPO)" and "Gaussian process (GP)" as methodological components but does not list any specific software libraries, frameworks, or operating systems with their version numbers that are critical for reproducibility.
Experiment Setup No The paper describes the experimental domains (Minigolf, Mu Jo Co) and the types of non-stationarity. It mentions that policy optimization uses PPO. However, it does not provide specific hyperparameters such as learning rates, batch sizes, number of training epochs, or specific neural network architectures, which are crucial for replicating the experimental setup.