Meta-learning Hyperparameter Performance Prediction with Neural Processes

Authors: Ying Wei, Peilin Zhao, Junzhou Huang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on extensive Open ML datasets and three computer vision datasets demonstrate that the proposed algorithm achieves state-of-the-art performance in at least one order of magnitude less trials.
Researcher Affiliation Collaboration 1Department of Computer Science, City University of Hong Kong, Hong Kong 2Tencent AI Lab, Shenzhen, China.
Pseudocode Yes Algorithm 1 Transferable Neural Processes (TNP) for Hyperparameter Optimization
Open Source Code No The paper does not contain any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes First of all, we consider the Open ML (Vanschoren et al., 2014) platform which contains a large number of datasets covering a wide range of applications. Besides Open ML, we also investigate the effectiveness of TNP on three popular computer vision datasets, including CIFAR-10 (Krizhevsky & Hinton, 2009), MNIST (Le Cun et al., 1995), and SVHN (Netzer et al., 2011).
Dataset Splits Yes The training, validation, and test sets of each dataset are exactly the same as Open ML provides. ... We take the last 10,000, 10,000, and 6,000 training instances as the validation set for CIFAR-10, MNIST, and SVHN, respectively.
Hardware Specification No The paper mentions 'CPU overhead time' and 'GPU' in a general sense but does not provide specific details such as model numbers, processor types, or memory specifications for the hardware used in experiments.
Software Dependencies No The paper mentions using 'Adam' for optimization, but it does not specify version numbers for any software components, libraries, or programming languages used.
Experiment Setup Yes The encoder, the decoder, and the attention embedding function g are all implemented as a two-layer multilayer perceptron with r = 128 hidden units. ... We set the batch size, the number of gradient steps k, the learning rate α for Adam, and the meta update rate ϵ to be 64, 10, 1e-5, and 0.01, respectively. We summarize all hyperparameter settings of TNP in Appendix C.1.