Self-similar Epochs: Value in arrangement
Authors: Eliav Buchnik, Edith Cohen, Avinatan Hasidim, Yossi Matias
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design efficient generators of coordinated and LSHrefined microbatches and study the effectiveness of different arrangements through experiments on synthetic stochastic block matrices and on recommendation data sets. We use the popular Skip Gram with Negative Sampling (SGNS) loss objective (Mikolov et al., 2013). We observe consistent training gain of 12-37% on blocks and of 3%-12% on our real data sets when using coordinated arrangements. |
| Researcher Affiliation | Collaboration | 1Tel Aviv University, Israel 2Google Research. |
| Pseudocode | Yes | Algorithm 1: Minibatch construction (Focus updates); Algorithm 2: IND microbatches; Algorithm 3: COO microbatches (Focus updates); Algorithm 4: Jaccard LSH map: Focus; Algorithm 5: Angular LSH map: Focus |
| Open Source Code | No | The paper states, "We implemented our methods in Python using the Tensor Flow library (Abadi & et al., 2015). We used the word embedding implementation of (Mikolov et al., 2013; word2vec.py)." It provides a URL for `word2vec.py` which is a third-party implementation they used, but there is no explicit statement or link indicating that the authors' own code for their novel arrangement schemes is openly available. |
| Open Datasets | Yes | The MOVIELENS1M dataset (Movielen1M) contains 106 reviews by 6 103 users of 4 103 movies. (Movielen1M) Movielen 1M Dataset. URL http:// grouplens.org/datasets/movielens/1m/. The AMAZON dataset (SNAP) contains 5 105 fine food reviews of 2.5 105 users on 7.5 103 food items. (SNAP) Stanford network analysis project. http://snap.stanford.edu. |
| Dataset Splits | No | We created a test set T+ of positive examples by sampling 20% of the non zero entries with probabilities proportional to κij. The remaining examples were used for training. We used 5 random splits of the data to test and training sets and 10 runs per split. The paper specifies train/test splits, but does not explicitly describe a separate validation split with specific proportions. |
| Hardware Specification | No | The paper states the software used ("implemented our methods in Python using the Tensor Flow library") but does not provide specific details about the hardware (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Python" and "the Tensor Flow library (Abadi & et al., 2015)" and "word2vec.py" (from tensorflow models), but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We used a fixed learning rate to facilitate a more accurate comparison of methods and trained with η = 0.005 to η = 0.15. We observed similar relative performance and report results with η = 0.02. We worked with minibatch size parameter values b {4, 64, 256} (recall that b is the number of positive examples and λ = 10 negative examples are matched with each positive example), and embeddings dimension d {5, 10, 25, 50, 100}. |