Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Language of Thought Models
Authors: Yi-Fu Wu, Minseung Lee, Sungjin Ahn
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NLo TM on several 2D and 3D image datasets, demonstrating superior performance in downstream tasks, out-of-distribution generalization, and image generation quality compared to patch-based VQ-VAE and continuous object-centric representations. |
| Researcher Affiliation | Academia | Yi-Fu Wu1, Minseung Lee2, Sungjin Ahn2 1Rutgers University 2KAIST |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulations, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | We will also release the source code upon acceptance of the paper. |
| Open Datasets | Yes | We evaluate our model on two variants of a 2D Sprites dataset (Watters et al., 2019a; Yoon et al., 2023) and three variants of the CLEVR dataset (Johnson et al., 2017), CLEVR-Easy, CLEVR-Hard, CLEVR-Tex. |
| Dataset Splits | Yes | Since all models can solve the task when evaluated on the ID dataset, we report the number of steps to reach 98% accuracy on the validation dataset. |
| Hardware Specification | Yes | Each model is trained on NVIDIA Quadro RTX 8000 GPUs with 48GB memory and we use half-precision floating-point format. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and Pixel CNN but does not specify versions for core software libraries like Python, PyTorch/TensorFlow, or other dependencies. |
| Experiment Setup | Yes | Table 11 shows the hyperparameters we used for the different datasets in our experiments with SVQ. For the d VAE and Transformer Decoder, we follow the hyperparameters, architecture, and training procedure provided in Singh et al. (2023) for CLEVR-Easy and CLEVR-Hard. |