Hyperparameter Learning via Distributional Transfer

Authors: Ho Chung Law, Peilin Zhao, Leung Sing Chan, Junzhou Huang, Dino Sejdinovic

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, across a range of regression and classification tasks, our methodology performs favourably at initialisation and has a faster convergence compared to existing baselines in some cases, the optimal accuracy is achieved in just a few evaluations.
Researcher Affiliation Collaboration Ho Chung Leon Law University of Oxford ho.law@stats.ox.ac.uk Peilin Zhao Tencent AI Lab masonzhao@tencent.com Lucian Chan University of Oxford leung.chan@stats.ox.ac.uk Junzhou Huang Tencent AI Lab joehhuang@tencent.com Dino Sejdinovic University of Oxford dino.sejdinovic@stats.ox.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It mentions using TensorFlow for implementation but does not share its own code.
Open Datasets Yes In particular, the Protein dataset consists of 7 different proteins extracted from [9]: ADAM17, AKT1, BRAF, COX1, FXA, GR, VEGFR2.
Dataset Splits Yes For testing, we use the same number of samples si for toy data, while using a 60-40 train-test split for real data.
Hardware Specification Yes Training time is less than 2 minutes on a standard 2.60GHz single-core CPU in all experiments.
Software Dependencies No The paper mentions using 'Tensor Flow [1] for implementation' and 'Sci Py [14]', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For φx and φy we will use a single hidden layer NN with tanh activation (with 20 hidden and 10 output units), except for classification tasks, where we use a one-hot encoding for φy. [...] For BLR, we will follow [26] and take feature map υ to be a NN with three 50-unit layers and tanh activation. [...] We take the embedding batch-size b = 1000, and learning rate for ADAM to be 0.005.