Meta-Reinforcement Learning by Tracking Task Non-stationarity
Authors: Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on different simulated problems and show it outperforms competitive baselines. Our experiments aim at addressing the following questions: Does TRIO successfully track and anticipate changes in the latent variables governing the problem? How does it perform under different non-stationarities? What is the advantage w.r.t. methods that neglect the non-stationarity? How better an oracle that knows the task evolution process can be? |
| Researcher Affiliation | Academia | Riccardo Poiani1 , Andrea Tirinzoni2 and Marcello Restelli1 1Politecnico di Milano 2Inria Lille |
| Pseudocode | Yes | the pseudo-code of TRIO can be found in Algorithm 1 and 2. Algorithm 1 TRIO (meta-training) Algorithm 2 TRIO (meta-testing) |
| Open Source Code | No | The paper mentions "An extended version of the paper with appendix is available on arXiv." but does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | No | The paper uses simulated environments (Minigolf, Mu Jo Co benchmarks like Half Cheetah Vel and Ant Goal) for experiments. While these environments are well-known in RL literature, the paper does not provide direct links, DOIs, specific repository names, or formal citations for publicly available datasets or the specific configurations of these simulated environments. |
| Dataset Splits | No | The paper discusses meta-training and meta-testing phases but does not provide specific details on validation dataset splits, such as percentages, sample counts, or methodology for creating such splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory, or cluster specifications). |
| Software Dependencies | No | The paper mentions using "proximal policy optimization (PPO)" and "Gaussian process (GP)" as methodological components but does not list any specific software libraries, frameworks, or operating systems with their version numbers that are critical for reproducibility. |
| Experiment Setup | No | The paper describes the experimental domains (Minigolf, Mu Jo Co) and the types of non-stationarity. It mentions that policy optimization uses PPO. However, it does not provide specific hyperparameters such as learning rates, batch sizes, number of training epochs, or specific neural network architectures, which are crucial for replicating the experimental setup. |