Lipschitz Continuity in Model-based Reinforcement Learning

Authors: Kavosh Asadi, Dipendra Misra, Michael Littman

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with empirical results that show the benefits of controlling the Lipschitz constant of neural-network models. ... Our first goal in this section is to compare TV, KL, and Wasserstein in terms of the ability to best quantify error of an imperfect model. ... We performed empirical evaluations to understand the impact of Lipschitz continuity of transition models, specifically when the transition model is used to perform multi-step state-predictions and policy improvements. We chose two standard domains: Cart Pole and Pendulum.
Researcher Affiliation Academia 1Department of Computer Science, Brown University, Providence, USA 2Department of Computer Science and Cornell Tech, Cornell University, New York, USA.
Pseudocode Yes Algorithm 1 GVI algorithm
Open Source Code Yes We release the code here: github.com/kavosh8/Lip
Open Datasets No The paper mentions using 'Cart Pole and Pendulum' as standard domains and generating datasets like '15 × 10^3 tuples s, a, s'' for training, but it does not provide concrete access information (link, DOI, formal citation) for these specific datasets or generated data to be publicly available.
Dataset Splits No The paper states, 'chose the model with median cross-validation error,' indicating the use of validation, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for validation.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes During training, we ensured that the weights of the network are smaller than k. ... Notably, we used deterministic policy gradient (Silver et al., 2014) for training the policy network with the hyper parameters suggested by Lillicrap et al. (2015). ... and with a fixed variance σ2 tuned as a hyper-parameter. ... In all cases we used value iteration for planning.