Transfer Learning with Neural AutoML

Authors: Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate using 21 text classification tasks with varied statistics. The dataset sizes range from 500 to 420k datapoints. The number of classes range from 2 to 157, and the mean length of the texts, in characters, range from 19 to 20k. The Appendix contains full statistics and references. Each child model is trained on the training set. The accuracy on the validation set is used as reward for the controller. The top N child models, selected on the validation set, are evaluated on the test set.
Researcher Affiliation Collaboration Catherine Wong MIT catwong@mit.edu Neil Houlsby Google Brain neilhoulsby@google.com Yifeng Lu Google Brain yifenglu@google.com Andrea Gesmundo Google Brain agesmundo@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the controller's operation and processes in text and diagrams, but not in a formal algorithm format.
Open Source Code No The paper mentions 'The pretrained modules are distributed via Tensor Flow Hub: https://www.tensorflow.org/hub'. This is a link to a third-party resource (pretrained modules) that the authors used, not to the source code for the methodology described in the paper itself.
Open Datasets Yes Data We evaluate using 21 text classification tasks with varied statistics. The dataset sizes range from 500 to 420k datapoints. The number of classes range from 2 to 157, and the mean length of the texts, in characters, range from 19 to 20k. The Appendix contains full statistics and references. For the image classification task, it mentions 'Cifar-10', 'MNIST and Flowers'. The pretrained modules are distributed via Tensor Flow Hub: https://www.tensorflow.org/hub.
Dataset Splits Yes Datasets without a pre-defined train/validation/test split, are split randomly 80/10/10.
Hardware Specification No The paper mentions '800 concurrent GPUs' in the introduction when describing prior work by Zoph and Le, but it does not provide specific hardware details (like GPU models, CPU models, or cloud configurations) used for its own experiments.
Software Dependencies No The paper mentions software components and algorithms like 'TensorFlow Hub', 'Proximal Adagrad', 'REINFORCE', 'TRPO', 'UREX', 'PPO', and '2-layer LSTM', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes The single search space for all tasks is defined by the following sequence of choices: 1) Pretrained embedding module. 2) Whether to fine-tune the embedding module. 3) Number of hidden layers (HL). 4) HL size. 5) HL activation function. 6) HL normalization scheme to use. 7) HL dropout rate. 8) Deep column learning rate. 9) Deep column regularization weight. 10) Wide layer learning rate. 11) Wide layer regularization weight. 12) Training steps. All models are trained using Proximal Adagrad with batch size 100. The controller is a 2-layer LSTM with 50 units. The action and task embeddings have size 25. The learning rate is set to 10^-4 and it receives gradient updates after every child completes.