Probabilistic Matrix Factorization for Automated Machine Learning
Authors: Nicolo Fusi, Rishit Sheth, Melih Elibol
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that our approach quickly identifies high-performing pipelines across a wide range of datasets, significantly outperforming the current state-of-the-art. |
| Researcher Affiliation | Collaboration | Nicolo Fusi, Rishit Sheth Microsoft Research, New England {nfusi,rishet}@microsoft.com Melih Elibol EECS, University of California, Berkeley elibol@cs.berkeley.edu |
| Pseudocode | No | The paper describes methods and equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Data and software available at https://github.com/rsheth80/pmf-automl/ |
| Open Datasets | Yes | We ran all of the experiments on 553 Open ML [28] datasets |
| Dataset Splits | Yes | We generated training data for our method by splitting each Open ML dataset in 80% training data, 10% validation data and 10% test data |
| Hardware Specification | No | The paper mentions 'approximately 3 hours on a 16-core Azure machine', but does not specify exact CPU models, GPU models, or memory details. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn [17]' and 'auto-sklearn library [4]' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We set the number of latent dimensions to Q = 20, stochastic gradient descent learning rate to η = 1e 7, and (column) batch-size to 50. The latent space was initialized using PCA, and training was run for 300 epochs (corresponding to approximately 3 hours on a 16-core Azure machine). Finally, we configured the acquisition function with ξ = 0.012. |