Meta Reinforcement Learning with Finite Training Tasks - a Density Estimation Approach
Authors: Zohar Rimon, Aviv Tamar, Gilad Adler
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we complement our theoretical results with an empirical investigation. Our goal is to show that our main idea of learning a KDE over a low dimensional space of tasks is effective also for state-of-the-art meta-RL algorithms, for which the linearity assumption of PCA clearly does not hold, and computing the optimal yet intractable π b f is replaced with an approximate deep RL method. |
| Researcher Affiliation | Collaboration | Zohar Rimon Technion Israel Institute of Technology zohar.rimon@campus.technion.ac.il Aviv Tamar Technion Israel Institute of Technology avivt@technion.ac.il Gilad Adler Ford Research Center Israel gadler3@ford.com |
| Pseudocode | Yes | The full implementation details and pseudo code are in Section A.10 of the supplementary. |
| Open Source Code | Yes | Our code will be publicly available at https://github.com/zoharrimon/Meta-RL-KDE |
| Open Datasets | Yes | To visualize the advantage of our approach, consider the Half Circle domain in Figure 1, adapted from [3]: a 2-dimensional agent must navigate to a goal, located somewhere on the half-circle. |
| Dataset Splits | No | The paper mentions `Ntrain` training tasks and `Neval` evaluation tasks, but does not explicitly describe a separate validation split or its purpose for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper states 'All experiments were run on an internal cluster with NVIDIA GPUs.' in the supplementary material, but does not provide specific hardware details such as GPU model numbers, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'the VariBAD code base by Zintgraf et al. [39]' and 'the PPO implementation from the Stable Baselines3 library [13]', but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | The full implementation details and pseudo code are in Section A.10 of the supplementary. This section includes details such as 'We set the batch size to 256, number of episodes per batch to 8.' and 'We use a learning rate of 0.0003 for the policy and 0.001 for the VAE.' |