Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

What Do Latent Action Models Actually Learn?

Authors: Chuheng Zhang, Tim Pearce, Pushi Zhang, Kaixin Wang, Xiaoyu Chen, Wei Shen, Li Zhao, Jiang Bian

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	These findings are validated through numerical simulations, as well as experiments in more realistic settings. This investigation is the first to rigorously investigate how the structure of observations, actions, and noise influence LAM learning. Section 5 verifies that the main findings based on linear LAM still hold on more realistic LAMs.
Researcher Affiliation	Collaboration	1Microsoft Research 2Tsinghua University 3Independent Researcher
Pseudocode	No	The paper describes the linear LAM model and then the model used for experiments in Section 5, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	At times, we will present numerical simulations of linear LAM to visually communicate later analysis. See the details of simulation in Appendix B and the code in supplementary.
Open Datasets	No	Dataset. We designed a 4 4 grid-world style synthetic dataset. The top 3 4 grid of the observation contains a square (intensity=1.0) that can be controlled with five actions (up, down, left, right, and stay still).
Dataset Splits	No	The paper describes the synthetic dataset and its generation, but does not explicitly provide training, validation, or test splits. It details the policy and evaluation metrics but not the data partitioning.
Hardware Specification	No	The paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications. The NeurIPS checklist mentions 'Our experiments only need a minimal computation set-up (e.g., a laptop)' but this is not specific enough.
Software Dependencies	No	We implement our system in Py Torch, optimizing trainable parameters via stochastic gradient descent with the Adam optimizer with batch size 128.
Experiment Setup	Yes	Optimization. We implement our system in Py Torch, optimizing trainable parameters via stochastic gradient descent with the Adam optimizer with batch size 128. We use the default learning rate and run for 4,000 steps to ensure convergence. Model. For the IDM, we use a small CNN to encode o and o , followed by a VQ bottleneck with codebook size of 5 outputting the latent. Finally, for the FDM, a separate UNet takes the latent and previous observation o to output the predicted ˆo . When predicting actions, codes are preassigned to actions, and latents are trained to minimize L2 distance to their true action code. Models were trained for 16k updates with the Adam optimizer. Unless specified, we use low stochastic noise, no action prediction, no data augmentation, and five codebook vectors.