Neural Systematic Binder

Authors: Gautam Singh, Yeongbin Kim, Sungjin Ahn

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we find that Sys Binder provides significantly better factor disentanglement within the slots than the conventional object-centric methods, including, for the first time, in visually complex scene images such as CLEVR-Tex.
Researcher Affiliation Academia Gautam Singh1 , Yeongbin Kim2 & Sungjin Ahn2 1Rutgers University 2KAIST
Pseudocode Yes A PSEUDO-CODE Algorithm 1 Neural Systematic Binder.
Open Source Code Yes We release all the resources used in this work, including the code and the datasets, at the project link: https://sites.google.com/view/neural-systematic-binder.
Open Datasets Yes Datasets. We evaluate our model on three datasets: CLEVR-Easy, CLEVR-Hard, and CLEVR-Tex. These are variants of the original CLEVR dataset (Johnson et al., 2017) ... We release our code and the datasets here1. 1https://sites.google.com/view/neural-systematic-binder
Dataset Splits No The paper mentions training steps and parameters but does not explicitly state the dataset splits (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification No The paper provides a table of 'Memory Consumption' and 'Time per Training Iteration' but does not specify the hardware (e.g., specific GPU or CPU models) on which these measurements were taken.
Software Dependencies No The paper mentions software components like GRU, MLP, and implicitly Python/PyTorch through its described architecture and cited works (e.g., SLATE), but it does not specify exact version numbers for any software dependencies.
Experiment Setup Yes Table 2: Hyperparameters of our model used in our experiments. This table details General, Sys Binder, Transformer Decoder, and d VAE hyperparameters including Batch Size (40), Training Steps (200K, 400K), Block Size (256, 128), # Blocks (8, 16), # Prototypes (64), # Iterations (3), # Slots (4, 6), Learning Rate (0.0001, 0.0003), # Decoder Blocks (8), # Decoder Heads (4, 8), Hidden Size (192), Dropout (0.1), Patch Size (4x4 pixels), Vocabulary Size (4096), Temperature Start (1.0), Temperature End (0.1), and Temperature Decay Steps (30000).