reproducibilityindex.ai

Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices

Authors: Jose Miguel Hernandez-Lobato, Neil Houlsby, Zoubin Ghahramani

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our method experimentally, demonstrating faster convergence than batch alternatives (Raiko et al., 2007) and yielding more accurate solutions than existing scalable variational methods (Nakajima et al., 2010; Seeger & Bouchard, 2012; Paquet & Koenigstein, 2013). SIBM is evaluated in experiments with synthetic and real-world binary matrices.
Researcher Affiliation	Academia	Jos e Miguel Hern andez-Lobato1 JMH233@CAM.AC.UK Neil Houlsby1 NMTH2@CAM.AC.UK Zoubin Ghahramani ZOUBIN@ENG.CAM.AC.UK University of Cambridge, Department of Engineering, Cambridge CB2 1PZ, UK
Pseudocode	Yes	Algorithm 1 Stochastic Inference for Binary Matrices
Open Source Code	Yes	All the code and data used is publicly available1. http://jmhl.org
Open Datasets	Yes	We consider two real-world datasets from the FIMI repository: purchase data from a retail store (retail) (Brijs et al., 1999) and click data from an online news portal (Kosarak). We include two datasets from the 2000 KDD Cup (Kohavi et al., 2000; Zheng et al., 2001), point of sale data from a retailer (POS, originally BMS-POS) and click data from an e-commerce website (Web View, originally BMS-Web View-2). Finally, we include the Netﬂix data, treating 4-5 star ratings as ones. All the code and data used is publicly available1. http://jmhl.org
Dataset Splits	Yes	Each matrix is randomly split into a training matrix and a set of test entries with value one. The training matrix is generated by randomly removing an entry with value one from each row in the original matrix and adding it to the test set. Predictive performance is evaluated using recall at N, a popular metric for recommendation tasks (Gunawardana & Shani, 2009) (equivalent to precision when a single one is held out from each row). For this, we iterate over the rows, using (3) to compute the probability of each zero entry actually taking value one. We select the top N zero entries with highest probability in that row. Recall is computed as the average number of times that the test entry appears in this list. We use N = 10 and repeat the experiment 25 times on each small dataset, and 10 times on each large one.
Hardware Specification	No	The paper states that "all methods are coded in C" but does not specify any hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies	No	The paper states that "all methods are coded in C" but does not provide specific software dependencies (e.g., library names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	We pre-process the original datasets so that we can compare to the computationally expensive batch approach. We keep the 1000 columns with the highest number of ones and discard rows with fewer than 10 ones. We consider small and large versions of each dataset. We subsample 2000 rows for the small and 40,000 rows for the large datasets, except in retail and Web View, where we use approximately the maximum number of rows for the large datasets, 10,000 and 5000, respectively. We fix D = 10 and generate U and V by sampling all the ui,d and vj,d independently from N(0, 1). The global bias is fixed to z = 7.5. We use N = 10 and repeat the experiment 25 times on each small dataset, and 10 times on each large one.