Dictionary Learning for Massive Matrix Factorization
Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (f MRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods. |
| Researcher Affiliation | Academia | Arthur Mensch ARTHUR.MENSCH@M4X.ORG Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France Julien Mairal JULIEN.MAIRAL@INRIA.FR Thoth team, Inria, Grenoble, France Bertrand Thirion BETRAND.THIRION@INRIA.FR Gaël Varoquaux GAEL.VAROQUAUX@INRIA.FR Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France |
| Pseudocode | Yes | Procedure 1 Dictionary Learning for Massive Data |
| Open Source Code | Yes | We use scikit-learn (Pedregosa et al., 2011) in experiments, and have released a python package1 for reproducibility. 1http://github.com/arthurmensch/modl |
| Open Datasets | Yes | We validate the performance of the proposed algorithm on recommender systems for explicit feedback, a well-studied matrix completion problem. We evaluate the scalability of our method on datasets of different dimension: Movie Lens 1M, Movie Lens 10M, and 140M ratings Netflix dataset. |
| Dataset Splits | Yes | For Movielens datasets, we use a random 25% of data for test and the rest for training. We average results on five train/test split for Movie Lens in Table 1. On Netflix, the probe dataset is used for testing. Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netflix, and grid search is performed on 15 values of λ between 10 2 and 10. |
| Hardware Specification | Yes | Benchmarking were run using a single 2.7 GHz Xeon CPU, with a 30 components dictionary. |
| Software Dependencies | No | The paper mentions 'scikit-learn (Pedregosa et al., 2011)' and 'python package' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netflix, and grid search is performed on 15 values of λ between 10 2 and 10. We use mini-batches of size n 100. |