Lipschitz Continuity in Model-based Reinforcement Learning
Authors: Kavosh Asadi, Dipendra Misra, Michael Littman
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with empirical results that show the benefits of controlling the Lipschitz constant of neural-network models. ... Our first goal in this section is to compare TV, KL, and Wasserstein in terms of the ability to best quantify error of an imperfect model. ... We performed empirical evaluations to understand the impact of Lipschitz continuity of transition models, specifically when the transition model is used to perform multi-step state-predictions and policy improvements. We chose two standard domains: Cart Pole and Pendulum. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Brown University, Providence, USA 2Department of Computer Science and Cornell Tech, Cornell University, New York, USA. |
| Pseudocode | Yes | Algorithm 1 GVI algorithm |
| Open Source Code | Yes | We release the code here: github.com/kavosh8/Lip |
| Open Datasets | No | The paper mentions using 'Cart Pole and Pendulum' as standard domains and generating datasets like '15 × 10^3 tuples s, a, s'' for training, but it does not provide concrete access information (link, DOI, formal citation) for these specific datasets or generated data to be publicly available. |
| Dataset Splits | No | The paper states, 'chose the model with median cross-validation error,' indicating the use of validation, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | During training, we ensured that the weights of the network are smaller than k. ... Notably, we used deterministic policy gradient (Silver et al., 2014) for training the policy network with the hyper parameters suggested by Lillicrap et al. (2015). ... and with a fixed variance σ2 tuned as a hyper-parameter. ... In all cases we used value iteration for planning. |