reproducibilityindex.ai

Model-Based Transfer Learning for Contextual Reinforcement Learning

Authors: Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our methods using urban traffic and standard continuous control benchmarks. The experimental results suggest that MBTL can achieve up to 43x improved sample efficiency compared with canonical independent training and multi-task training.
Researcher Affiliation	Academia	Jung-Hoon Cho MIT jhooncho@mit.edu Vindula Jayawardana MIT vindula@mit.edu Sirui Li MIT siruil@mit.edu Cathy Wu MIT cathywu@mit.edu
Pseudocode	Yes	A.2 Model-Based Transfer Learning (MBTL) Algorithm
Open Source Code	Yes	Code is available at https://github.com/jhoon-cho/MBTL/.
Open Datasets	Yes	Our experiments consider CMDPs that span standard and real-world benchmarks. In particular, we consider standard continuous control benchmarks from the CARL library [5]. In addition, we study problems from RL for intelligent transportation systems, using [49] to model the CMDPs. ... We used the microscopic traffic simulation called Simulation of Urban MObility (SUMO) [26] v.1.16.0 ... License: CARL falls under the Apache License 2.0 as is permitted by all work that we use [5].
Dataset Splits	No	The paper specifies training on K source tasks and evaluating on N target tasks. For example, 'We evaluate our method by the average performance across all N target tasks after training up to K =15 source tasks or the number of source tasks needed to achieve a certain level of performance.' However, it does not explicitly detail a specific validation dataset split (e.g., percentages or counts for a separate validation set) from the training or test data.
Hardware Specification	Yes	All experiments are done on a distributed computing cluster equipped with 48 Intel Xeon Platinum 8260 CPUs.
Software Dependencies	Yes	We used the microscopic traffic simulation called Simulation of Urban MObility (SUMO) [26] v.1.16.0 and PPO for RL algorithm [36]. We utilized the default implementation of the PPO algorithm with default hyperparameters provided by the Stable-Baselines3 library [34].
Experiment Setup	Yes	We utilize Deep Q-Networks (DQN) for discrete action spaces [29] and Proximal Policy Optimization (PPO) for continuous action spaces [36]. For statistical reliability, we run each experiment three times with different random seeds. We employ min-max normalization of the rewards for each task, and we provide comprehensive details about our model in Appendix A.4.1. ... We used the Gaussian Process Regressor implementation from scikit-learn... We vary the GP hyperparameters, including noise standard deviation over the set {0.001, 0.01, 0.1, 1}, the number of restarts for the optimizer over {5, 6, . . . , 15}, and explore several kernel configurations on the synthetic data.