reproducibilityindex.ai

Stratified Sampling Meets Machine Learning

Authors: Edo Liberty, Kevin Lang, Konstantin Shmakov

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experimental results significantly improve over both uniform sampling and standard stratified sampling which are de-facto the industry standards. In this section we present an array of experimental results using our algorithm. We compare it to uniform sampling and stratified sampling. We also study the effects of varying the number of training example and strength of the regularization. This is done for both synthetic and real datasets.
Researcher Affiliation	Industry	Kevin Lang LANGK@YAHOO-INC.COM Yahoo Research Edo Liberty EDO@YAHOO-INC.COM Yahoo Research Konstantin Shmakov KSHMAKOV@YAHOO-INC.COM Yahoo Research
Pseudocode	Yes	Algorithm 1 Train: regularized ERM algorithm. Algorithm 2 Test: measure expected test error.
Open Source Code	No	The paper does not contain any statement about making its source code openly available or provide a link to a code repository.
Open Datasets	Yes	DBLP Dataset In this dataset we use a real database from DBLP and synthetic queries. Records correspond to 2,101,151 academic papers from the DBLP public database (database). From the publicly available DBLP database XML file we selected all papers from the 1000 most populous venues. database, DBLP XML. http://dblp.uni-trier.de/xml/.
Dataset Splits	No	The paper specifies training and testing splits, e.g., 'The 50,000 random queries were split into 40,000 for training and 10,000 for testing.' but does not mention a separate validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory size, or cluster specifications) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies or libraries with version numbers required to reproduce the experiments.
Experiment Setup	Yes	Our experiments focus exclusively on the relative error defined by L(ˆy, y) = (ˆy/y − 1)^2. As a practical shortcut, this is achievable without modifying Algorithm 1 at all. The only modification needed is normalizing all training queries such that y = 1 before executing Algorithm 1. We also study the effects of varying the number of training example and strength of the regularization. The x-axis in Figure 2 varies with the value of the parameter η which controls the strength of regularization.