Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control
Authors: Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm benefits from significantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control.6 EXPERIMENTS In this section, we compare the performance of PCC with two model-based control algorithm baselines: RCE7 (Banijamali et al., 2018) and E2C (Watter et al., 2015), as well as running a thorough ablation study on various components of PCC. |
| Researcher Affiliation | Collaboration | Nir Levine1 , Yinlam Chow2 , Rui Shu3, Ang Li1, Mohammad Ghavamzadeh4, Hung Bui5 1Deep Mind, 2Google Research, 3Stanford University, 4Facebook AI Research, 5Vin AI |
| Pseudocode | No | The paper describes the iLQR algorithm in text format within Appendix B, but it does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the PCC methodology itself. Footnote 8 provides a link to a demo video, not the source code: 'See a control demo on the TORCS simulator at https://youtu.be/GBrg ALRZ2fw'. |
| Open Datasets | No | To generate our training and test sets, each consists of triples (xt, ut, xt+1), we: (1) sample an underlying state st and generate its corresponding observation xt, (2) sample an action ut, and (3) obtain the next state st+1 according to the state transition dynamics, add it a zero-mean Gaussian noise with variance σ2Ins, and generate corresponding observation xt+1.To ensure that the observation-action data is uniformly distributed (see Section 3), we sample the state-action pair (st, ut) uniformly from the state-action space. |
| Dataset Splits | No | The paper states, 'To generate our training and test sets', but does not mention specific training/validation/test splits or percentages. |
| Hardware Specification | Yes | Comparison jobs were deployed on the Planar system using Nvidia TITAN Xp GPU. |
| Software Dependencies | No | ADAM (Goodfellow et al., 2016) with α = 5 10 4, β1 = 0.9, β2 = 0.999, and ϵ = 10 8. L2 regularization with a coefficient of 10 3. Additional VAE (Kingma & Welling, 2013) loss term...While the paper mentions software like ADAM and VAE, it does not specify version numbers for these or any other libraries/packages. |
| Experiment Setup | Yes | Batch size of 128. ADAM (Goodfellow et al., 2016) with α = 5 10 4, β1 = 0.9, β2 = 0.999, and ϵ = 10 8. L2 regularization with a coefficient of 10 3. Additional VAE (Kingma & Welling, 2013) loss term given by ℓVAE t = Eq(z|x) [log p(x|z)] + DKL (q(z|x) p(z)), where p(z) N(0, 1). The term was added with a very small coefficient of 0.01. ... λp was set to 1 across all domains. λc was set to be 7 across all domains... λcur was set to be 1 across all domains... { z, u}, for the curvature loss, were generated from {z, u} by adding Gaussian noise N(0, 0.12), where σ = 0.1... |