Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Context Selection for Embedding Models
Authors: Liping Liu, Francisco Ruiz, Susan Athey, David Blei
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run a comprehensive experimental study on three datasets, namely, Movie Lens for movie recommendations, e Bird-PA for bird watching events, and grocery data for shopping behavior. We found that CS-EFE consistently outperforms EFE in terms of held-out predictive performance on the three datasets. |
| Researcher Affiliation | Academia | Li-Ping Liu Tufts University Francisco J. R. Ruiz Columbia University University of Cambridge Susan Athey Stanford University David M. Blei Columbia University |
| Pseudocode | No | No pseudocode or algorithm block was found in the paper. |
| Open Source Code | Yes | The code is in the github repo: https://github.com/blei-lab/context-selection-embedding |
| Open Datasets | Yes | Movie Lens: We consider the Movie Lens-100K dataset (Harper and Konstan, 2015)... e Bird-PA: The e Bird data (Munson et al., 2015; Sullivan et al., 2009) contains information about a set of bird observation events. |
| Dataset Splits | Yes | We set aside 9% of the data for validation and 10% for test. (Movie Lens) We split the data into train (67%), test (26%), and validation (7%) sets. (eBird-PA) We split the data into training (86%), test (5%), and validation (9%) sets. (Market-Basket). |
| Hardware Specification | Yes | We also acknowledge the support of NVIDIA Corporation with the donation of two GPUs used for this research. |
| Software Dependencies | No | We use stochastic gradient descent to maximize the objective function, adaptively setting the stepsize with Adam (Kingma and Ba, 2015). (No specific software versions provided for reproducibility.) |
| Experiment Setup | Yes | We explore different values for the dimensionality K of the embedding vectors. ... We use negative sampling (Rudolph et al., 2016) with a ratio of 1/10 of positive (non-zero) versus negative samples. We use stochastic gradient descent to maximize the objective function, adaptively setting the stepsize with Adam (Kingma and Ba, 2015)... We consider unit-variance ℓ2-regularization, and the weight of the regularization term is fixed to 1.0. ... In the context selection for exponential family embeddings (CS-EFE) model, we set the number of hidden units to 30 and 15 for each of the hidden layers, and we consider 40 bins to form the histogram. |