Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Sampling for SGD by Exploiting Side Information

Authors: Siddharth Gopal

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.
Researcher Affiliation	Industry	Siddharth Gopal EMAIL Google Inc, 1600 Amphitheatre Parkway
Pseudocode	Yes	The complete pseudocode is given in Algorithm 1.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	We used three datasets for experimentation ALOI, CIFAR100 and IPC, 1. ALOI 1 : An image database... (footnote: http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/multiclass.html) ... 3. IPC 2 : A set of 75,250 patents... (footnote: http://gcdart.blogspot.com/2012/08/datasets 929.html)
Dataset Splits	Yes	Unless otherwise noted, all learning rates were carefully tuned (using the scheme in (Bottou, 2010)) to achieve the lowest objective at the cutoff point and the regularization was set using a 20% validation set.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, only referencing algorithms or general concepts like 'Ada Grad'.
Experiment Setup	Yes	For all the experiments, we define ci to be the class-label associated with the instance. We generically set J = N/4 and δ = 0.5. The choice for J was made so as to ensure no noticeable increase in the computational cost and δ was set to a midpoint value between the two distributions. Unless otherwise noted, all learning rates were carefully tuned (using the scheme in (Bottou, 2010)) to achieve the lowest objective at the cutoff point and the regularization was set using a 20% validation set.