reproducibilityindex.ai

Probabilistic Matrix Factorization for Automated Machine Learning

Authors: Nicolo Fusi, Rishit Sheth, Melih Elibol

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that our approach quickly identiﬁes high-performing pipelines across a wide range of datasets, signiﬁcantly outperforming the current state-of-the-art.
Researcher Affiliation	Collaboration	Nicolo Fusi, Rishit Sheth Microsoft Research, New England {nfusi,rishet}@microsoft.com Melih Elibol EECS, University of California, Berkeley elibol@cs.berkeley.edu
Pseudocode	No	The paper describes methods and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Data and software available at https://github.com/rsheth80/pmf-automl/
Open Datasets	Yes	We ran all of the experiments on 553 Open ML [28] datasets
Dataset Splits	Yes	We generated training data for our method by splitting each Open ML dataset in 80% training data, 10% validation data and 10% test data
Hardware Specification	No	The paper mentions 'approximately 3 hours on a 16-core Azure machine', but does not specify exact CPU models, GPU models, or memory details.
Software Dependencies	No	The paper mentions software like 'scikit-learn [17]' and 'auto-sklearn library [4]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We set the number of latent dimensions to Q = 20, stochastic gradient descent learning rate to η = 1e 7, and (column) batch-size to 50. The latent space was initialized using PCA, and training was run for 300 epochs (corresponding to approximately 3 hours on a 16-core Azure machine). Finally, we conﬁgured the acquisition function with ξ = 0.012.