Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vector Quantization in the Brain: Grid-like Codes in World Models

Authors: Xiangyuan Peng, Xingsi Dong, Si Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a spatiotemporal compression model, GCQ is first evaluated in ablation studies to demonstrate its ability to compress and reconstruct observations. We then show that GCQ, when used as a world model, supports long-horizon prediction, goal-directed planning, and inverse modeling. Compared to traditional two-stage models, GCQ exhibits superior performance in long-range prediction tasks.
Researcher Affiliation	Academia	Xiangyuan Peng EMAIL Xingsi Dong EMAIL Si Wu EMAIL PKU-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies. IDG/Mc Govern Institute for Brain Research, Center of Quantitative Biology, Peking University.
Pseudocode	Yes	C Python Implementation Our Python implementation of GCQ is fully vectorized, relying exclusively on matrix operations without any for loops. This design makes it highly amenable to parallelization. 1 def forward(self , latents: Tensor , label: Tensor) -> Tuple[Tensor , Tensor ]:
Open Source Code	Yes	5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We have included the code for reproducing the main results in supplementary material.
Open Datasets	Yes	Datasets. We evaluate GCQ on four datasets, all of which contain image-based observations. The 2DMaze [47] dataset is a virtual environment where actions correspond to the agent s movements. ... The Google Street View (GSV) dataset represents real-world environments with partial observations... In the MPI3D [48] and 3DShapes [49] datasets, actions are defined as abstract feature-level changes.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits (percentages or counts). It mentions the datasets used but not how they were partitioned for experiments.
Hardware Specification	Yes	Here are the hyperparameters used in the experiment. All programs run on an NVIDIA A100-SXM4-80GB.
Software Dependencies	No	The paper mentions 'Python Implementation' and uses 'torch' in the code snippet, but does not specify version numbers for Python, PyTorch, or any other software dependencies.
Experiment Setup	Yes	Here are the hyperparameters used in the experiment. All programs run on an NVIDIA A100-SXM4-80GB. The experiments reported in this paper, including the Vi T, Res Net, and Hybrid networks, required 8-12 hours of training each. For Vi T and hybrid architectures, we trained for 40 epochs with a learning rate of 1e-4; for the Res Net network, we trained for 100 epochs with a learning rate of 3e-4. All training runs used the Adam optimizer.