Rapid Adaptation with Conditionally Shifted Neurons
Authors: Tsendsuren Munkhdalai, Xingdi Yuan, Soroush Mehri, Adam Trischler
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed CSNs on tasks from the vision and language domains. Below we describe the datasets we evaluate on and the according preprocessing steps, followed by test results and an ablation study. |
| Researcher Affiliation | Industry | 1Microsoft Research, Montr eal, Qu ebec, Canada. |
| Pseudocode | No | No, the paper does not contain structured pseudocode or algorithm blocks; it primarily describes the methods using text and mathematical equations. |
| Open Source Code | No | No, the paper states 'Code and data will be available at https://aka.ms/csns', which indicates a future release and not immediate, concrete access to the source code at the time of publication. |
| Open Datasets | Yes | In the vision domain, we used two widely adopted fewshot classification benchmarks: the Omniglot and Mini Image Net datasets. Omniglot consists of images from 1623 classes from 50 different alphabets, with only 20 images per class (Lake et al., 2015). For Mini-Image Net features 84 84-pixel color images from 100 classes... We ran our experiments on the class subset released by Ravi & Larochelle (2017). To evaluate the effectiveness of recurrent models with conditionally shifted neurons, we ran experiments on the few-shot Penn Treebank (PTB) language modeling task introduced by Vinyals et al. (2016). |
| Dataset Splits | Yes | Mini-Image Net features 84 84-pixel color images from 100 classes (64/16/20 for training/validation/test splits) and each class has 600 exemplar images. |
| Hardware Specification | No | No, the paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | No, the paper does not list specific software dependencies with their version numbers required for replication (e.g., 'Python 3.8, PyTorch 1.9'). While it cites Chainer, it does not state it as a dependency with a version number for its implementation. |
| Experiment Setup | No | No, the paper mentions general training aspects like 'optimize the model parameters end-to-end via stochastic gradient descent (SGD)' and references 'Full implementation details can be found in Appendix A', but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations within the main text provided. |