Learning and Inference via Maximum Inner Product Search
Authors: Stephen Mussmann, Stefano Ermon
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that it performs well both on synthetic data and neural language models with large output spaces.The main purpose of our empirical evaluation is to demonstrate that our MIPS reduction using Gumbels (MRG) doesn’t affect the accuracy of sampling or inference. We show the results of the reduction on model averaging and learning via gradient descent, two tasks introduced in the Background section. We also show the empirical speedup achieved using a particular MIPS technique. |
| Researcher Affiliation | Academia | Stephen Mussmann MUSSMANN@STANFORD.EDU Stanford University, 450 Serra Mall, Stanford, CA 94305 USA |
| Pseudocode | Yes | Algorithm 1 MIPS-Gumbel Initialization, Algorithm 2 MIPS-Gumbel t Samples, Algorithm 3 MIPS-Gumbel Inverse Partition Estimate |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The real data we use is the word2vec dataset, a word embedding dataset released by Google (Mikolov et al., 2013a;b). |
| Dataset Splits | No | The paper mentions using synthetic data and word2vec data but does not provide explicit training, validation, or test dataset splits. |
| Hardware Specification | No | The paper only states 'a single core running in python' without specifying any particular CPU, GPU models, or detailed hardware specifications. |
| Software Dependencies | No | The paper mentions 'python' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For the MIPS reduction, k = 5 and t = 100. In general, a larger k will make the samples for different θ less dependent and a larger t will decrease the variance of the estimate.and a Gaussian prior is put on the parameters. We achieve this by using Equation 5 with an extra term for the Gaussian prior. |