reproducibilityindex.ai

Transfer Learning with Neural AutoML

Authors: Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate using 21 text classiﬁcation tasks with varied statistics. The dataset sizes range from 500 to 420k datapoints. The number of classes range from 2 to 157, and the mean length of the texts, in characters, range from 19 to 20k. The Appendix contains full statistics and references. Each child model is trained on the training set. The accuracy on the validation set is used as reward for the controller. The top N child models, selected on the validation set, are evaluated on the test set.
Researcher Affiliation	Collaboration	Catherine Wong MIT catwong@mit.edu Neil Houlsby Google Brain neilhoulsby@google.com Yifeng Lu Google Brain yifenglu@google.com Andrea Gesmundo Google Brain agesmundo@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the controller's operation and processes in text and diagrams, but not in a formal algorithm format.
Open Source Code	No	The paper mentions 'The pretrained modules are distributed via Tensor Flow Hub: https://www.tensorflow.org/hub'. This is a link to a third-party resource (pretrained modules) that the authors used, not to the source code for the methodology described in the paper itself.
Open Datasets	Yes	Data We evaluate using 21 text classiﬁcation tasks with varied statistics. The dataset sizes range from 500 to 420k datapoints. The number of classes range from 2 to 157, and the mean length of the texts, in characters, range from 19 to 20k. The Appendix contains full statistics and references. For the image classification task, it mentions 'Cifar-10', 'MNIST and Flowers'. The pretrained modules are distributed via Tensor Flow Hub: https://www.tensorflow.org/hub.
Dataset Splits	Yes	Datasets without a pre-deﬁned train/validation/test split, are split randomly 80/10/10.
Hardware Specification	No	The paper mentions '800 concurrent GPUs' in the introduction when describing prior work by Zoph and Le, but it does not provide specific hardware details (like GPU models, CPU models, or cloud configurations) used for its own experiments.
Software Dependencies	No	The paper mentions software components and algorithms like 'TensorFlow Hub', 'Proximal Adagrad', 'REINFORCE', 'TRPO', 'UREX', 'PPO', and '2-layer LSTM', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	The single search space for all tasks is deﬁned by the following sequence of choices: 1) Pretrained embedding module. 2) Whether to ﬁne-tune the embedding module. 3) Number of hidden layers (HL). 4) HL size. 5) HL activation function. 6) HL normalization scheme to use. 7) HL dropout rate. 8) Deep column learning rate. 9) Deep column regularization weight. 10) Wide layer learning rate. 11) Wide layer regularization weight. 12) Training steps. All models are trained using Proximal Adagrad with batch size 100. The controller is a 2-layer LSTM with 50 units. The action and task embeddings have size 25. The learning rate is set to 10^-4 and it receives gradient updates after every child completes.