Scaling-Up Split-Merge MCMC with Locality Sensitive Sampling (LSS)
Authors: Chen Luo, Anshumali Shrivastava4464-4471
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Overall, we obtain a superior tradeoff between convergence and per update cost. As a direct consequence, our proposals are around 6X faster than the state-of-the-art sampling methods on two large real datasets KDDCUP and Pub Med with several millions of entities and thousands of clusters. |
| Researcher Affiliation | Academia | Chen Luo, Anshumali Shrivastava Department of Computer Science, Rice University {cl67, anshumali}@rice.edu |
| Pseudocode | No | The paper describes the proposed algorithms textually and with mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a direct link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate the effectiveness of our algorithm on both two large real-world datasets: KDDCUP and Pub Med. KDDCUP data was used in the KDD Cup 2004 data mining competition. [...] 1https://cs.joensuu.fi/sipu/datasets/ The Pub Med abstraction dataset [...] 2www.pubmed.gov |
| Dataset Splits | No | The paper mentions using KDDCUP, Pub Med, and synthetic datasets but does not explicitly state the proportions or methodology for train/validation/test splits for any of them. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | No | The paper mentions parameters like K and L for LSH methods and k for synthetic data generation, but it does not specify general experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings. |