Mutual exclusivity as a challenge for deep neural networks

Authors: Kanishk Gandhi, Brenden M. Lake

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we investigate whether or not vanilla neural architectures have an ME bias, demonstrating that they lack this learning assumption. Moreover, we show that their inductive biases are poorly matched to lifelong learning formulations of classification and translation. We demonstrate that there is a compelling case for designing task-general neural networks that learn through mutual exclusivity, which remains an open challenge.
Researcher Affiliation Collaboration Kanishk Gandhi New York University kanishk.gandi@nyu.edu Brenden Lake New York University Facebook AI Research brenden@nyu.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We analyze three common datasets for machine translation, each consisting of pairs of sentences in two languages (see Table 1). The vocabularies are truncated based on word frequency in accordance with the standard practices for training neural machine translation models [32, 33, 34]. This section examines the Omniglot dataset [35] and the Image Net dataset [36].
Dataset Splits No The paper mentions training on 90 pairs and evaluating on 10 test pairs for synthetic data, and discussing how classes are encountered over time in lifelong learning scenarios. However, it does not explicitly provide the specific training/validation/test splits (e.g., percentages or exact counts) for the real-world datasets that would be needed for general reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU models, GPU models, memory, or cloud instance types) used for running its experiments. It generally refers to training neural networks.
Software Dependencies No The paper mentions various neural network components and optimizers (e.g., ReLU, TanH, Sigmoid, Adam, Momentum, SGD) but does not specify the version numbers for any software frameworks or libraries used (e.g., TensorFlow, PyTorch, scikit-learn, etc.).
Experiment Setup Yes A wide range of neural architectures are evaluated on the mutual exclusivity test. We use an embedding layer to map the input symbols to vectors of size 20 or 100, followed optionally by a hidden layer, and then by a 100-way softmax output layer. The networks are trained with different activation functions (Re LUs [27], Tan H, Sigmoid), optimizers (Adam [28], Momentum, SGD), learning rates (0.1, 0.01, 0.001) and regularizers (weight decay, batch-normalisation [29], dropout [30], and entropy regularization (see Appendix B.1)). The models are trained to maximize log-likelihood. All together, we evaluated over 400 different models on the synthetic ME task. For Omniglot, a convolutional neural network was trained on 1623-way classification. The architecture consists of 3 convolutional layers (each consisting of 5 5 kernels and 64 feature maps), a fully connected layer (576 128) and a softmax classification layer. It was trained with a batch size of 16 using an Adam optimizer and a learning rate of 0.001. For Imagenet, a Resnet18 model [37] was trained on 1000-way classification with a batch size of 256, using an Adam optimizer and a learning rate of 0.001.