Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Path-SGD: Path-Normalized Optimization in Deep Neural Networks
Authors: Behnam Neyshabur, Russ R. Salakhutdinov, Nati Srebro
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare ℓ2-Path-SGDto two commonly used optimization methods in deep learning, SGD and Ada Grad. We conduct our experiments on four common benchmark datasets: the standard MNIST dataset of handwritten digits [8]; CIFAR-10 and CIFAR-100 datasets of tiny images of natural scenes [7]; and Street View House Numbers (SVHN) dataset containing color images of house numbers collected by Google Street View [10]. |
| Researcher Affiliation | Academia | Behnam Neyshabur Toyota Technological Institute at Chicago EMAIL Ruslan Salakhutdinov Departments of Statistics and Computer Science University of Toronto EMAIL Nathan Srebro Toyota Technological Institute at Chicago EMAIL |
| Pseudocode | Yes | Algorithm 1 Path-SGDupdate rule |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We conduct our experiments on four common benchmark datasets: the standard MNIST dataset of handwritten digits [8]; CIFAR-10 and CIFAR-100 datasets of tiny images of natural scenes [7]; and Street View House Numbers (SVHN) dataset containing color images of house numbers collected by Google Street View [10]. |
| Dataset Splits | Yes | To choose α, for each dataset, we considered the validation errors over the validation set (10000 randomly chosen points that are kept out during the initial training) and picked the one that reaches the minimum error faster. We then trained the network over the entire training set. All the networks were trained both with and without dropout. Details of the datasets are shown in Table 1. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | Yes | In all of our experiments, we trained feed-forward networks with two hidden layers, each containing 4000 hidden units. We used mini-batches of size 100 and the step-size of 10 α, where α is an integer between 0 and 10. When training with dropout, at each update step, we retained each unit with probability 0.5. In balanced initialization, incoming weights to each unit v are initialized to i.i.d samples from a Gaussian distribution with standard deviation 1/ p fan-in(v). |