Neural Language of Thought Models

Authors: Yi-Fu Wu, Minseung Lee, Sungjin Ahn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate NLo TM on several 2D and 3D image datasets, demonstrating superior performance in downstream tasks, out-of-distribution generalization, and image generation quality compared to patch-based VQ-VAE and continuous object-centric representations.
Researcher Affiliation Academia Yi-Fu Wu1, Minseung Lee2, Sungjin Ahn2 1Rutgers University 2KAIST
Pseudocode No The paper provides architectural diagrams and mathematical formulations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No We will also release the source code upon acceptance of the paper.
Open Datasets Yes We evaluate our model on two variants of a 2D Sprites dataset (Watters et al., 2019a; Yoon et al., 2023) and three variants of the CLEVR dataset (Johnson et al., 2017), CLEVR-Easy, CLEVR-Hard, CLEVR-Tex.
Dataset Splits Yes Since all models can solve the task when evaluated on the ID dataset, we report the number of steps to reach 98% accuracy on the validation dataset.
Hardware Specification Yes Each model is trained on NVIDIA Quadro RTX 8000 GPUs with 48GB memory and we use half-precision floating-point format.
Software Dependencies No The paper mentions using the Adam optimizer and Pixel CNN but does not specify versions for core software libraries like Python, PyTorch/TensorFlow, or other dependencies.
Experiment Setup Yes Table 11 shows the hyperparameters we used for the different datasets in our experiments with SVQ. For the d VAE and Transformer Decoder, we follow the hyperparameters, architecture, and training procedure provided in Singh et al. (2023) for CLEVR-Easy and CLEVR-Hard.