Discrete-Valued Neural Communication

Authors: Dianbo Liu, Alex M. Lamb, Kenji Kawaguchi, Anirudh Goyal ALIAS PARTH GOYAL, Chen Sun, Michael C. Mozer, Yoshua Bengio

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that discrete-valued neural communication (DVNC) substantially improves systematic generalization in a variety of architectures transformers, modular architectures, and graph neural networks.
Researcher Affiliation Collaboration Dianbo Liu Mila Alex Lamb Mila Kenji Kawaguchi Harvard University Anirudh Goyal Mila Chen Sun Mila Michael C. Mozer Google Research, Brain Team Yoshua Bengio Mila
Pseudocode Yes Appendix E presents the pseudocode for RIMs with discretization.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We adapted and modified the original 2D shapes and 3D shapes movement tasks from Kipf et al. (2019)... We experimented with the Sort-of-CLEVR visual relational reasoning task... (Santoro et al., 2017)... we consider the task of classifying MNIST digits as sequences of pixels (Krueger et al., 2016).
Dataset Splits No The paper mentions 'training data' and 'test set' and 'OOD settings' (e.g., 'five objects are available in training data, three objects are available in OOD-1 and only two objects are available in OOD-2') but does not provide specific percentages or counts for training/validation/test splits, nor does it reference predefined splits with citations for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes We picked β = 0.25 as in the original VQ-VAE paper (Oord et al., 2017). We initialized e using k-means clustering on vectors h with k = L and trained the codebook together with other parts of the model by gradient descent.