Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach

Authors: Zohar Rimon, Aviv Tamar, Gilad Adler

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we complement our theoretical results with an empirical investigation. Our goal is to show that our main idea of learning a KDE over a low dimensional space of tasks is effective also for state-of-the-art meta-RL algorithms, for which the linearity assumption of PCA clearly does not hold, and computing the optimal yet intractable π b f is replaced with an approximate deep RL method.
Researcher Affiliation Collaboration Zohar Rimon Technion Israel Institute of Technology zohar.rimon@campus.technion.ac.il Aviv Tamar Technion Israel Institute of Technology avivt@technion.ac.il Gilad Adler Ford Research Center Israel gadler3@ford.com
Pseudocode Yes The full implementation details and pseudo code are in Section A.10 of the supplementary.
Open Source Code Yes Our code will be publicly available at https://github.com/zoharrimon/Meta-RL-KDE
Open Datasets Yes To visualize the advantage of our approach, consider the Half Circle domain in Figure 1, adapted from [3]: a 2-dimensional agent must navigate to a goal, located somewhere on the half-circle.
Dataset Splits No The paper mentions `Ntrain` training tasks and `Neval` evaluation tasks, but does not explicitly describe a separate validation split or its purpose for hyperparameter tuning or early stopping.
Hardware Specification No The paper states 'All experiments were run on an internal cluster with NVIDIA GPUs.' in the supplementary material, but does not provide specific hardware details such as GPU model numbers, CPU types, or memory specifications.
Software Dependencies No The paper mentions using 'the VariBAD code base by Zintgraf et al. [39]' and 'the PPO implementation from the Stable Baselines3 library [13]', but does not specify version numbers for these software dependencies.
Experiment Setup Yes The full implementation details and pseudo code are in Section A.10 of the supplementary. This section includes details such as 'We set the batch size to 256, number of episodes per batch to 8.' and 'We use a learning rate of 0.0003 for the policy and 0.001 for the VAE.'