Scalable Sampling for Nonsymmetric Determinantal Point Processes

Authors: Insu Han, Mike Gartrell, Jennifer Gillenwater, Elvis Dohmatob, amin karbasi

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments we compare the speed of all of these samplers for a variety of real-world tasks. In Table 2, we observe that the predictive performance of our ONDPP models generally match or sometimes exceed the baseline.
Researcher Affiliation Collaboration Insu Han Yale University insu.han@yale.edu Mike Gartrell Criteo AI Lab m.gartrell@criteo.com Jennifer Gillenwater Google Research jengi@google.com Elvis Dohmatob Facebook AI Research dohmatob@fb.com Amin Karbasi Yale University amin.karbasi@yale.edu
Pseudocode Yes Algorithm 1 Cholesky-based NDPP sampling (Poulson, 2019, Algorithm 1), Algorithm 2 Rejection NDPP sampling (Tree-based sampling), Algorithm 3 Tree-based DPP sampling (Gillenwater et al., 2019), Algorithm 4 Youla decomposition of low-rank skew-symmetric matrix
Open Source Code Yes All of the code implementing our constrained learning and sampling algorithms is publicly available . The proofs for our theoretical contributions are available in Appendix E. For our experiments, all dataset processing steps, experimental procedures, and hyperparameter settings are described in Appendices A, B, and C, respectively. (and footnote: https://github.com/insuhan/nonsymmetric-dpp-sampling)
Open Datasets Yes UK Retail: This dataset (Chen et al., 2012) contains baskets representing transactions from an online retail company that sells all-occasion gifts., Recipe: This dataset (Majumder et al., 2019) contains recipes and food reviews from Food.com (formerly Genius Kitchen) ., Instacart: This dataset (Instacart, 2017) contains baskets purchased by Instacart users ., Million Song: This dataset (Mc Fee et al., 2012) contains playlists ( baskets ) of songs from Echo Nest users ., Book: This dataset (Wan & Mc Auley, 2018) contains reviews from the Goodreads book review website, including a variety of attributes describing the items***. (and associated footnotes with URLs for Recipe, Instacart, Million Song, Book datasets)
Dataset Splits Yes We use 300 randomly-selected baskets as a held-out validation set, for tracking convergence during training and for tuning hyperparameters. Another 2000 random baskets are used for testing, and the rest are used for training.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No We use Py Torch with Adam (Kingma & Ba, 2015) for optimization. and We use Pytorch’s linalg.solve to avoid the expense of explicitly computing the (B B) 1 inverse. No specific version numbers are provided for PyTorch or other libraries.
Experiment Setup Yes We perform a grid search using a held-out validation set to select the best-performing hyperparameters for each model and dataset. The hyperparameter settings used for each model and dataset are described below. For all of the above model configurations and datasets, we use a batch size of 800 during training.