Dictionary Learning for Massive Matrix Factorization

Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (f MRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods.
Researcher Affiliation Academia Arthur Mensch ARTHUR.MENSCH@M4X.ORG Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France Julien Mairal JULIEN.MAIRAL@INRIA.FR Thoth team, Inria, Grenoble, France Bertrand Thirion BETRAND.THIRION@INRIA.FR Gaël Varoquaux GAEL.VAROQUAUX@INRIA.FR Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France
Pseudocode Yes Procedure 1 Dictionary Learning for Massive Data
Open Source Code Yes We use scikit-learn (Pedregosa et al., 2011) in experiments, and have released a python package1 for reproducibility. 1http://github.com/arthurmensch/modl
Open Datasets Yes We validate the performance of the proposed algorithm on recommender systems for explicit feedback, a well-studied matrix completion problem. We evaluate the scalability of our method on datasets of different dimension: Movie Lens 1M, Movie Lens 10M, and 140M ratings Netflix dataset.
Dataset Splits Yes For Movielens datasets, we use a random 25% of data for test and the rest for training. We average results on five train/test split for Movie Lens in Table 1. On Netflix, the probe dataset is used for testing. Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netflix, and grid search is performed on 15 values of λ between 10 2 and 10.
Hardware Specification Yes Benchmarking were run using a single 2.7 GHz Xeon CPU, with a 30 components dictionary.
Software Dependencies No The paper mentions 'scikit-learn (Pedregosa et al., 2011)' and 'python package' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netflix, and grid search is performed on 15 values of λ between 10 2 and 10. We use mini-batches of size n 100.