reproducibilityindex.ai

Self-similar Epochs: Value in arrangement

Authors: Eliav Buchnik, Edith Cohen, Avinatan Hasidim, Yossi Matias

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design efﬁcient generators of coordinated and LSHreﬁned microbatches and study the effectiveness of different arrangements through experiments on synthetic stochastic block matrices and on recommendation data sets. We use the popular Skip Gram with Negative Sampling (SGNS) loss objective (Mikolov et al., 2013). We observe consistent training gain of 12-37% on blocks and of 3%-12% on our real data sets when using coordinated arrangements.
Researcher Affiliation	Collaboration	1Tel Aviv University, Israel 2Google Research.
Pseudocode	Yes	Algorithm 1: Minibatch construction (Focus updates); Algorithm 2: IND microbatches; Algorithm 3: COO microbatches (Focus updates); Algorithm 4: Jaccard LSH map: Focus; Algorithm 5: Angular LSH map: Focus
Open Source Code	No	The paper states, "We implemented our methods in Python using the Tensor Flow library (Abadi & et al., 2015). We used the word embedding implementation of (Mikolov et al., 2013; word2vec.py)." It provides a URL for `word2vec.py` which is a third-party implementation they used, but there is no explicit statement or link indicating that the authors' own code for their novel arrangement schemes is openly available.
Open Datasets	Yes	The MOVIELENS1M dataset (Movielen1M) contains 106 reviews by 6 103 users of 4 103 movies. (Movielen1M) Movielen 1M Dataset. URL http:// grouplens.org/datasets/movielens/1m/. The AMAZON dataset (SNAP) contains 5 105 ﬁne food reviews of 2.5 105 users on 7.5 103 food items. (SNAP) Stanford network analysis project. http://snap.stanford.edu.
Dataset Splits	No	We created a test set T+ of positive examples by sampling 20% of the non zero entries with probabilities proportional to κij. The remaining examples were used for training. We used 5 random splits of the data to test and training sets and 10 runs per split. The paper specifies train/test splits, but does not explicitly describe a separate validation split with specific proportions.
Hardware Specification	No	The paper states the software used ("implemented our methods in Python using the Tensor Flow library") but does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions using "Python" and "the Tensor Flow library (Abadi & et al., 2015)" and "word2vec.py" (from tensorflow models), but it does not provide specific version numbers for these software components.
Experiment Setup	Yes	We used a ﬁxed learning rate to facilitate a more accurate comparison of methods and trained with η = 0.005 to η = 0.15. We observed similar relative performance and report results with η = 0.02. We worked with minibatch size parameter values b {4, 64, 256} (recall that b is the number of positive examples and λ = 10 negative examples are matched with each positive example), and embeddings dimension d {5, 10, 25, 50, 100}.