reproducibilityindex.ai

Dictionary Learning for Massive Matrix Factorization

Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁciency of our approach on massive functional Magnetic Resonance Imaging (f MRI) data, and on matrix completion problems for recommender systems, where we obtain signiﬁcant speed-ups compared to state-of-the art coordinate descent methods.
Researcher Affiliation	Academia	Arthur Mensch ARTHUR.MENSCH@M4X.ORG Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France Julien Mairal JULIEN.MAIRAL@INRIA.FR Thoth team, Inria, Grenoble, France Bertrand Thirion BETRAND.THIRION@INRIA.FR Gaël Varoquaux GAEL.VAROQUAUX@INRIA.FR Parietal team, Inria, CEA, Paris-Saclay University. Neurospin, Gif-sur-Yvette, France
Pseudocode	Yes	Procedure 1 Dictionary Learning for Massive Data
Open Source Code	Yes	We use scikit-learn (Pedregosa et al., 2011) in experiments, and have released a python package1 for reproducibility. 1http://github.com/arthurmensch/modl
Open Datasets	Yes	We validate the performance of the proposed algorithm on recommender systems for explicit feedback, a well-studied matrix completion problem. We evaluate the scalability of our method on datasets of different dimension: Movie Lens 1M, Movie Lens 10M, and 140M ratings Netﬂix dataset.
Dataset Splits	Yes	For Movielens datasets, we use a random 25% of data for test and the rest for training. We average results on ﬁve train/test split for Movie Lens in Table 1. On Netﬂix, the probe dataset is used for testing. Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netﬂix, and grid search is performed on 15 values of λ between 10 2 and 10.
Hardware Specification	Yes	Benchmarking were run using a single 2.7 GHz Xeon CPU, with a 30 components dictionary.
Software Dependencies	No	The paper mentions 'scikit-learn (Pedregosa et al., 2011)' and 'python package' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Regularization parameter λ is set by cross-validation on the training set: the training data is split 3 times, keeping 33% of Movielens datasets for evaluation and 1% for Netﬂix, and grid search is performed on 15 values of λ between 10 2 and 10. We use mini-batches of size n 100.