reproducibilityindex.ai

A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Authors: Ben London

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy.
Researcher Affiliation	Industry	Ben London blondon@amazon.com Amazon AI
Pseudocode	Yes	Algorithm 1 Adaptive Sampling SGD
Open Source Code	No	The paper does not explicitly state that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	Yes	To demonstrate the effectiveness of Algorithm 1, we conducted several experiments with the CIFAR-10 dataset [12]. ... [12] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
Dataset Splits	No	This benchmark dataset contains 60,000 (32 32)-pixel RGB images from 10 object classes, with a standard, static partitioning into 50,000 training examples and 10,000 test examples. We tuned all hyperparameters using random subsets of the training data for cross-validation.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We speciﬁed the hypothesis class as the following convolutional neural network architecture: 32 (3 3) ﬁlters with rectiﬁed linear unit (Re LU) activations in the ﬁrst and second layers, followed by (2 2) max-pooling and 0.25 dropout; 64 (3 3) ﬁlters with Re LU activations in the third and fourth layers, again followed by (2 2) max-pooling and 0.25 dropout; ﬁnally, a fully-connected, 512-unit layer with Re LU activations and 0.5 dropout, followed by a fully-connected, 10-output softmax layer. We trained the network using the cross-entropy loss. ... standard SGD with decreasing step sizes, ηt η/(1+νt) η/(νt), for η > 0 and ν > 0; and Ada Grad [5]... We used mini-batches of 100 examples per update.