reproducibilityindex.ai

Matrix Completion in the Unit Hypercube via Structured Matrix Factorization

Authors: Emanuele Bugliarello, Swayambhoo Jain, Vineeth Rakesh

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the effectiveness of our proposed models by extensive numerical tests on our VFX dataset and two additional datasets with values that are also bounded in the [0, 1] interval.
Researcher Affiliation	Collaboration	Emanuele Bugliarello1 , Swayambhoo Jain2 and Vineeth Rakesh2 1Tokyo Institute of Technology 2Technicolor AI Lab
Pseudocode	No	The paper describes algorithms but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	Yes	Experimental results on public data are available on Git Hub 1. URL: https://github.com/e-bug/unit-mf.
Open Datasets	Yes	In our third application, we consider the task of placing online advertisements in categories of websites (e.g., Entertainment, Finance, etc.) to maximize their click-through rate. To simulate this scenario, we use publicly available data from a Kaggle competition 2 run by the advertisement company Outbrain. This dataset contains users webpage views and clicks on multiple publisher sites in the United States in a two-week period in June 2016. URL: https://www.kaggle.com/c/outbrain-click-prediction.
Dataset Splits	Yes	We use 3 rounds of Monte Carlo cross-validation on the movie production data (due to few non-missing entries) and 3-fold cross-validation on the OTT and CTR data.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper describes the learning algorithms used but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	We use a maximum number of 100 epochs and a tolerance deﬁned by L(t) L(t 1) L(t) < 10 6 as stopping criteria for training, where L(t) is the objective cost at epoch t. In all SGDbased algorithms, we use batches of 8 entries on the VFX data, and of 128 entries on the OTT and CTR data. The best number of latent factors K is searched over all possible values in the VFX data, while we use the common values of K {10, 15, 20} in the larger OTT and CTR data. Each matrix factor is initialized with uniformly random numbers in (0, 1), and biases are initialized as zero vectors.