reproducibilityindex.ai

Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach

Authors: Zohar Rimon, Aviv Tamar, Gilad Adler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we complement our theoretical results with an empirical investigation. Our goal is to show that our main idea of learning a KDE over a low dimensional space of tasks is effective also for state-of-the-art meta-RL algorithms, for which the linearity assumption of PCA clearly does not hold, and computing the optimal yet intractable π b f is replaced with an approximate deep RL method.
Researcher Affiliation	Collaboration	Zohar Rimon Technion Israel Institute of Technology zohar.rimon@campus.technion.ac.il Aviv Tamar Technion Israel Institute of Technology avivt@technion.ac.il Gilad Adler Ford Research Center Israel gadler3@ford.com
Pseudocode	Yes	The full implementation details and pseudo code are in Section A.10 of the supplementary.
Open Source Code	Yes	Our code will be publicly available at https://github.com/zoharrimon/Meta-RL-KDE
Open Datasets	Yes	To visualize the advantage of our approach, consider the Half Circle domain in Figure 1, adapted from [3]: a 2-dimensional agent must navigate to a goal, located somewhere on the half-circle.
Dataset Splits	No	The paper mentions `Ntrain` training tasks and `Neval` evaluation tasks, but does not explicitly describe a separate validation split or its purpose for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper states 'All experiments were run on an internal cluster with NVIDIA GPUs.' in the supplementary material, but does not provide specific hardware details such as GPU model numbers, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using 'the VariBAD code base by Zintgraf et al. [39]' and 'the PPO implementation from the Stable Baselines3 library [13]', but does not specify version numbers for these software dependencies.
Experiment Setup	Yes	The full implementation details and pseudo code are in Section A.10 of the supplementary. This section includes details such as 'We set the batch size to 256, number of episodes per batch to 8.' and 'We use a learning rate of 0.0003 for the policy and 0.001 for the VAE.'