Meta-learning Hyperparameter Performance Prediction with Neural Processes
Authors: Ying Wei, Peilin Zhao, Junzhou Huang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on extensive Open ML datasets and three computer vision datasets demonstrate that the proposed algorithm achieves state-of-the-art performance in at least one order of magnitude less trials. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, City University of Hong Kong, Hong Kong 2Tencent AI Lab, Shenzhen, China. |
| Pseudocode | Yes | Algorithm 1 Transferable Neural Processes (TNP) for Hyperparameter Optimization |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | First of all, we consider the Open ML (Vanschoren et al., 2014) platform which contains a large number of datasets covering a wide range of applications. Besides Open ML, we also investigate the effectiveness of TNP on three popular computer vision datasets, including CIFAR-10 (Krizhevsky & Hinton, 2009), MNIST (Le Cun et al., 1995), and SVHN (Netzer et al., 2011). |
| Dataset Splits | Yes | The training, validation, and test sets of each dataset are exactly the same as Open ML provides. ... We take the last 10,000, 10,000, and 6,000 training instances as the validation set for CIFAR-10, MNIST, and SVHN, respectively. |
| Hardware Specification | No | The paper mentions 'CPU overhead time' and 'GPU' in a general sense but does not provide specific details such as model numbers, processor types, or memory specifications for the hardware used in experiments. |
| Software Dependencies | No | The paper mentions using 'Adam' for optimization, but it does not specify version numbers for any software components, libraries, or programming languages used. |
| Experiment Setup | Yes | The encoder, the decoder, and the attention embedding function g are all implemented as a two-layer multilayer perceptron with r = 128 hidden units. ... We set the batch size, the number of gradient steps k, the learning rate α for Adam, and the meta update rate ϵ to be 64, 10, 1e-5, and 0.01, respectively. We summarize all hyperparameter settings of TNP in Appendix C.1. |