Probabilistic Matrix Factorization for Automated Machine Learning

Authors: Nicolo Fusi, Rishit Sheth, Melih Elibol

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that our approach quickly identifies high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art.
Researcher Affiliation Collaboration Nicolo Fusi, Rishit Sheth Microsoft Research, New England {nfusi,rishet}@microsoft.com Melih Elibol EECS, University of California, Berkeley elibol@cs.berkeley.edu
Pseudocode No The paper describes methods and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Data and software available at https://github.com/rsheth80/pmf-automl/
Open Datasets Yes We ran all of the experiments on 553 Open ML [28] datasets
Dataset Splits Yes We generated training data for our method by splitting each Open ML dataset in 80% training data, 10% validation data and 10% test data
Hardware Specification No The paper mentions 'approximately 3 hours on a 16-core Azure machine', but does not specify exact CPU models, GPU models, or memory details.
Software Dependencies No The paper mentions software like 'scikit-learn [17]' and 'auto-sklearn library [4]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We set the number of latent dimensions to Q = 20, stochastic gradient descent learning rate to η = 1e 7, and (column) batch-size to 50. The latent space was initialized using PCA, and training was run for 300 epochs (corresponding to approximately 3 hours on a 16-core Azure machine). Finally, we configured the acquisition function with ξ = 0.012.