reproducibilityindex.ai

Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control

Authors: Nir Levine, Yinlam Chow, Rui Shu, Ang Li, Mohammad Ghavamzadeh, Hung Bui

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark domains demonstrate that the new variational-PCC learning algorithm beneﬁts from signiﬁcantly more stable and reproducible training, and leads to superior control performance. Further ablation studies give support to the importance of all three PCC components for learning a good latent space for control.6 EXPERIMENTS In this section, we compare the performance of PCC with two model-based control algorithm baselines: RCE7 (Banijamali et al., 2018) and E2C (Watter et al., 2015), as well as running a thorough ablation study on various components of PCC.
Researcher Affiliation	Collaboration	Nir Levine1 , Yinlam Chow2 , Rui Shu3, Ang Li1, Mohammad Ghavamzadeh4, Hung Bui5 1Deep Mind, 2Google Research, 3Stanford University, 4Facebook AI Research, 5Vin AI
Pseudocode	No	The paper describes the iLQR algorithm in text format within Appendix B, but it does not present any formal pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the PCC methodology itself. Footnote 8 provides a link to a demo video, not the source code: 'See a control demo on the TORCS simulator at https://youtu.be/GBrg ALRZ2fw'.
Open Datasets	No	To generate our training and test sets, each consists of triples (xt, ut, xt+1), we: (1) sample an underlying state st and generate its corresponding observation xt, (2) sample an action ut, and (3) obtain the next state st+1 according to the state transition dynamics, add it a zero-mean Gaussian noise with variance σ2Ins, and generate corresponding observation xt+1.To ensure that the observation-action data is uniformly distributed (see Section 3), we sample the state-action pair (st, ut) uniformly from the state-action space.
Dataset Splits	No	The paper states, 'To generate our training and test sets', but does not mention specific training/validation/test splits or percentages.
Hardware Specification	Yes	Comparison jobs were deployed on the Planar system using Nvidia TITAN Xp GPU.
Software Dependencies	No	ADAM (Goodfellow et al., 2016) with α = 5 10 4, β1 = 0.9, β2 = 0.999, and ϵ = 10 8. L2 regularization with a coefﬁcient of 10 3. Additional VAE (Kingma & Welling, 2013) loss term...While the paper mentions software like ADAM and VAE, it does not specify version numbers for these or any other libraries/packages.
Experiment Setup	Yes	Batch size of 128. ADAM (Goodfellow et al., 2016) with α = 5 10 4, β1 = 0.9, β2 = 0.999, and ϵ = 10 8. L2 regularization with a coefﬁcient of 10 3. Additional VAE (Kingma & Welling, 2013) loss term given by ℓVAE t = Eq(z\|x) [log p(x\|z)] + DKL (q(z\|x) p(z)), where p(z) N(0, 1). The term was added with a very small coefﬁcient of 0.01. ... λp was set to 1 across all domains. λc was set to be 7 across all domains... λcur was set to be 1 across all domains... { z, u}, for the curvature loss, were generated from {z, u} by adding Gaussian noise N(0, 0.12), where σ = 0.1...