Submodular Hamming Metrics
Authors: Jennifer A. Gillenwater, Rishabh K. Iyer, Bethany Lusch, Rahul Kidambi, Jeff A. Bilmes
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, we demonstrate empirically the effectiveness of these metrics and associated algorithms on both a metric minimization task (a form of clustering) and also a metric maximization task (generating diverse k-best lists). |
| Researcher Affiliation | Academia | University of Washington, Dept. of EE, Seattle, U.S.A. University of Washington, Dept. of Applied Math, Seattle, U.S.A. {jengi, rkiyer, herwaldt, rkidambi, bilmes}@uw.edu |
| Pseudocode | Yes | Algorithm 1 UNION-SPLIT; Algorithm 2 BEST-B; Algorithm 3 MAJOR-MIN |
| Open Source Code | No | The paper mentions using the 'WORD2VEC code of [26]' (a third-party tool) but does not provide any statement or link indicating that the authors' own implementation code for their proposed methods is open-sourced or publicly available. |
| Open Datasets | Yes | Moving beyond synthetic data, we applied the same method to the problem of clustering NIPS papers. The initial set of documents that we consider consists of all NIPS papers1 from 1987 to 2014. ...1Papers were downloaded from http://papers.nips.cc/. ... We employ the image summarization dataset from [8], which consists of 14 image collections, each of which contains n = 100 images. |
| Dataset Splits | No | The paper does not explicitly provide specific training, validation, or test dataset splits (e.g., percentages, counts, or predefined split references). It describes data processing and evaluation metrics, but not the partitioning of data for these stages. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using 'the WORD2VEC code of [26]' but does not provide specific version numbers for this or any other software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For synthetic data: 'We set the number of word features to n = 1000, and partition the features into 100 word classes... We set the minimum cluster center size to ℓ= 100. We use k-means++ initialization [25] and average over 10 trials.' For NIPS data: 'We set the center size cardinality constraint to ℓ= 100 and set the number of document clusters to k = 10. To initialize, we again use k-means++ [25], with k = 10. Results are averaged over 10 trials.' For diverse k-best: 'For each image collection, we seek k = 15 summaries of size ℓ= 10. For g we use the facility location function: g(A) = P i V maxj A Sij, where Sij is a similarity score for images i and j. We compute Sij by taking the dot product of the ith and jth feature vectors'. |