Investigating the Role of Model-Based Learning in Exploration and Transfer
Authors: Jacob C Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Theophane Weber, Jessica B Hamrick
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To conduct our experiments, we make a variety of implementation choices regarding model-based and model-free learning and intrinsic exploration; however, we emphasize that our contribution is less about these particular choices and more about the insights that our experiments bring. Overall, we find that model-based exploration combined with model-based fine-tuning results in better transfer performance than model-free baselines. More precisely, we show that: (1) Model-based methods perform better exploration than their model-free counterparts in reward-free environments. (2) Knowledge is transferred most effectively when performing model-based (as opposed to model-free) pre-training and fine-tuning. (3) System dynamics present in the world model seem to improve transfer performance. (4) The model-based advantage is stronger when the dynamics model is trained on the same environment. |
| Researcher Affiliation | Industry | 1Deep Mind, London, UK 2Google Research. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | No explicit statement or link providing concrete access to source code for the methodology described in the paper was found. |
| Open Datasets | Yes | Crafter (Hafner, 2021) is a survival game inspired by the popular game Minecraft. Robo Desk (Kannan et al., 2021) is a control environment simulating a robotic arm interacting with a table and a fixed set of objects. Meta-World (Yu et al., 2020) is a robotic control suite of up to 50 tasks with a SAWYER arm. |
| Dataset Splits | No | The paper describes training and fine-tuning phases, and separates tasks into training and testing suites (e.g., Meta-World ML-10), but does not provide specific numerical percentages or sample counts for dataset splits (training/validation/test) in a general sense to reproduce the data partitioning. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions using “Adam optimizer” but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, or other libraries). |
| Experiment Setup | Yes | Table 1. Hyper-parameters for Model-based Pre-training and Fine-tuning and Table 2. Hyper-parameters for Model-free Pre-training and Fine-tuning provide detailed hyperparameter values such as Model Unroll Length, TD-Steps, Replay Size, Initial Learning Rate, and Learning Rate Schedule. The batch size for the training is 1024. |