Discrete-Valued Neural Communication
Authors: Dianbo Liu, Alex M. Lamb, Kenji Kawaguchi, Anirudh Goyal ALIAS PARTH GOYAL, Chen Sun, Michael C. Mozer, Yoshua Bengio
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that discrete-valued neural communication (DVNC) substantially improves systematic generalization in a variety of architectures transformers, modular architectures, and graph neural networks. |
| Researcher Affiliation | Collaboration | Dianbo Liu Mila Alex Lamb Mila Kenji Kawaguchi Harvard University Anirudh Goyal Mila Chen Sun Mila Michael C. Mozer Google Research, Brain Team Yoshua Bengio Mila |
| Pseudocode | Yes | Appendix E presents the pseudocode for RIMs with discretization. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We adapted and modified the original 2D shapes and 3D shapes movement tasks from Kipf et al. (2019)... We experimented with the Sort-of-CLEVR visual relational reasoning task... (Santoro et al., 2017)... we consider the task of classifying MNIST digits as sequences of pixels (Krueger et al., 2016). |
| Dataset Splits | No | The paper mentions 'training data' and 'test set' and 'OOD settings' (e.g., 'five objects are available in training data, three objects are available in OOD-1 and only two objects are available in OOD-2') but does not provide specific percentages or counts for training/validation/test splits, nor does it reference predefined splits with citations for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | We picked β = 0.25 as in the original VQ-VAE paper (Oord et al., 2017). We initialized e using k-means clustering on vectors h with k = L and trained the codebook together with other parts of the model by gradient descent. |