reproducibilityindex.ai

Lipschitz Continuity in Model-based Reinforcement Learning

Authors: Kavosh Asadi, Dipendra Misra, Michael Littman

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with empirical results that show the beneﬁts of controlling the Lipschitz constant of neural-network models. ... Our ﬁrst goal in this section is to compare TV, KL, and Wasserstein in terms of the ability to best quantify error of an imperfect model. ... We performed empirical evaluations to understand the impact of Lipschitz continuity of transition models, speciﬁcally when the transition model is used to perform multi-step state-predictions and policy improvements. We chose two standard domains: Cart Pole and Pendulum.
Researcher Affiliation	Academia	1Department of Computer Science, Brown University, Providence, USA 2Department of Computer Science and Cornell Tech, Cornell University, New York, USA.
Pseudocode	Yes	Algorithm 1 GVI algorithm
Open Source Code	Yes	We release the code here: github.com/kavosh8/Lip
Open Datasets	No	The paper mentions using 'Cart Pole and Pendulum' as standard domains and generating datasets like '15 × 10^3 tuples s, a, s'' for training, but it does not provide concrete access information (link, DOI, formal citation) for these specific datasets or generated data to be publicly available.
Dataset Splits	No	The paper states, 'chose the model with median cross-validation error,' indicating the use of validation, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for validation.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	During training, we ensured that the weights of the network are smaller than k. ... Notably, we used deterministic policy gradient (Silver et al., 2014) for training the policy network with the hyper parameters suggested by Lillicrap et al. (2015). ... and with a ﬁxed variance σ2 tuned as a hyper-parameter. ... In all cases we used value iteration for planning.