Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
What Do Latent Action Models Actually Learn?
Authors: Chuheng Zhang, Tim Pearce, Pushi Zhang, Kaixin Wang, Xiaoyu Chen, Wei Shen, Li Zhao, Jiang Bian
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | These findings are validated through numerical simulations, as well as experiments in more realistic settings. This investigation is the first to rigorously investigate how the structure of observations, actions, and noise influence LAM learning. Section 5 verifies that the main findings based on linear LAM still hold on more realistic LAMs. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2Tsinghua University 3Independent Researcher |
| Pseudocode | No | The paper describes the linear LAM model and then the model used for experiments in Section 5, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | At times, we will present numerical simulations of linear LAM to visually communicate later analysis. See the details of simulation in Appendix B and the code in supplementary. |
| Open Datasets | No | Dataset. We designed a 4 4 grid-world style synthetic dataset. The top 3 4 grid of the observation contains a square (intensity=1.0) that can be controlled with five actions (up, down, left, right, and stay still). |
| Dataset Splits | No | The paper describes the synthetic dataset and its generation, but does not explicitly provide training, validation, or test splits. It details the policy and evaluation metrics but not the data partitioning. |
| Hardware Specification | No | The paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications. The NeurIPS checklist mentions 'Our experiments only need a minimal computation set-up (e.g., a laptop)' but this is not specific enough. |
| Software Dependencies | No | We implement our system in Py Torch, optimizing trainable parameters via stochastic gradient descent with the Adam optimizer with batch size 128. |
| Experiment Setup | Yes | Optimization. We implement our system in Py Torch, optimizing trainable parameters via stochastic gradient descent with the Adam optimizer with batch size 128. We use the default learning rate and run for 4,000 steps to ensure convergence. Model. For the IDM, we use a small CNN to encode o and o , followed by a VQ bottleneck with codebook size of 5 outputting the latent. Finally, for the FDM, a separate UNet takes the latent and previous observation o to output the predicted ˆo . When predicting actions, codes are preassigned to actions, and latents are trained to minimize L2 distance to their true action code. Models were trained for 16k updates with the Adam optimizer. Unless specified, we use low stochastic noise, no action prediction, no data augmentation, and five codebook vectors. |