reproducibilityindex.ai

Calibrated Model-Based Deep Reinforcement Learning

Authors: Ali Malik, Volodymyr Kuleshov, Jiaming Song, Danny Nemer, Harlan Seymour, Stefano Ermon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach on benchmarks for contextual bandits and continuous control (Li et al., 2010; Todorov et al., 2012), as well as on a planning problem in inventory management (Van Roy et al., 1997). Our results show that calibration consistently improves the cumulative reward and the sample complexity of model-based agents, and also enhances their ability to balance exploration and exploitation in contextual bandit settings.
Researcher Affiliation	Collaboration	1Department of Computer Science, Stanford University, USA 2Afresh Technologies, San Francisco, USA.
Pseudocode	Yes	In Algorithm 1, we present a simple procedure that augments a model-based reinforcement learning algorithm with an extra step that ensures the calibration of its transition model.
Open Source Code	Yes	Our code is available at https://github.com/ermongroup/Calibrated Model Based RL
Open Datasets	Yes	We evaluate the calibrated version (Cal Lin UCB) and uncalibrated version (Lin UCB) of the Lin UCB algorithm on both synthetic data that satisﬁes the linearity assumption of the algorithm, as well as on real UCI datasets from Li et al. (2010). We use the Corporacion Favorita Kaggle dataset, which consists of historical sales from a supermarket chain in Ecuador.
Dataset Splits	Yes	We experiment on the 100 highest-selling items and use data from 2014-01-01 to 2016-05-31 for training and data from 2016-06-01 to 2016-08-31 for testing.
Hardware Specification	No	Insufficient information. The paper does not specify any hardware details such as CPU, GPU, or TPU models used for running the experiments.
Software Dependencies	No	Insufficient information. The paper mentions software environments like MuJoCo and OpenAI Gym, and models like Bayesian Dense Net, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The Bayesian Dense Net has ﬁve layers of 128 hidden units with a dropout rate of 0.5 and parametric Re LU nonlinearities. We use variational dropout (Gal & Ghahramani, 2016b) to compute probabilistic forecasts from the model. We follow the training procedure and hyperparameters in Chua et al. (2018), as described in https://github.com/kchua/handful-of-trials.