Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes
Authors: Insu Han, Mike Gartrell, Elvis Dohmatob, Amin Karbasi
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With both a theoretical analysis and experiments on realworld datasets, we verify that our scalable approximate sampling algorithms are orders of magnitude faster than existing sampling approaches for k-NDPPs and NDPPs. |
| Researcher Affiliation | Collaboration | Insu Han 1 Mike Gartrell 2 Elvis Dohmatob 3 Amin Karbasi 1 1Yale University 2Criteo AI Lab, Paris, France 3Facebook AI Lab, Paris, France. Correspondence to: Insu Han <EMAIL>, Mike Gartrell <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 MCMC Sampling for k-NDPP; Algorithm 2 Up Operator via Rejection Sampling; Algorithm 3 Tree-based k-DPP Sampling; Algorithm 4 MCMC Sampling for NDPP |
| Open Source Code | Yes | The source code for our NDPP sampling algorithms is publicly available at https://github.com/insuhan/ndpp-mcmc-sampling. |
| Open Datasets | Yes | UK Retail: This dataset (Chen et al., 2012); Recipe: This dataset (Majumder et al., 2019); Instacart: This dataset (Instacart, 2017); Million Song: This dataset (Mc Fee et al., 2012); Book: This dataset (Wan & Mc Auley, 2018) |
| Dataset Splits | Yes | We use the training scheme from (Han et al., 2022), where 300 randomly-selected baskets are held-out as a validation set for tracking convergence during training, another 2000 random subsets are used for testing, and the remaining baskets are used for training. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, or specific cloud instances) were explicitly provided for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer (Kingma & Ba, 2015)' but does not specify software versions for libraries or programming languages. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2015); we initialize D from N(0, 1), and V and B are initialized from the U([0, 1]). We set α = β = 0.01 for all datasets. |