Investigating the Role of Model-Based Learning in Exploration and Transfer

Authors: Jacob C Walker, Eszter Vértes, Yazhe Li, Gabriel Dulac-Arnold, Ankesh Anand, Theophane Weber, Jessica B Hamrick

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To conduct our experiments, we make a variety of implementation choices regarding model-based and model-free learning and intrinsic exploration; however, we emphasize that our contribution is less about these particular choices and more about the insights that our experiments bring. Overall, we find that model-based exploration combined with model-based fine-tuning results in better transfer performance than model-free baselines. More precisely, we show that: (1) Model-based methods perform better exploration than their model-free counterparts in reward-free environments. (2) Knowledge is transferred most effectively when performing model-based (as opposed to model-free) pre-training and fine-tuning. (3) System dynamics present in the world model seem to improve transfer performance. (4) The model-based advantage is stronger when the dynamics model is trained on the same environment.
Researcher Affiliation Industry 1Deep Mind, London, UK 2Google Research.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code No No explicit statement or link providing concrete access to source code for the methodology described in the paper was found.
Open Datasets Yes Crafter (Hafner, 2021) is a survival game inspired by the popular game Minecraft. Robo Desk (Kannan et al., 2021) is a control environment simulating a robotic arm interacting with a table and a fixed set of objects. Meta-World (Yu et al., 2020) is a robotic control suite of up to 50 tasks with a SAWYER arm.
Dataset Splits No The paper describes training and fine-tuning phases, and separates tasks into training and testing suites (e.g., Meta-World ML-10), but does not provide specific numerical percentages or sample counts for dataset splits (training/validation/test) in a general sense to reproduce the data partitioning.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were mentioned in the paper.
Software Dependencies No The paper mentions using “Adam optimizer” but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes Table 1. Hyper-parameters for Model-based Pre-training and Fine-tuning and Table 2. Hyper-parameters for Model-free Pre-training and Fine-tuning provide detailed hyperparameter values such as Model Unroll Length, TD-Steps, Replay Size, Initial Learning Rate, and Learning Rate Schedule. The batch size for the training is 1024.