Nonparametric Neural Networks

Authors: George Philipp, Jaime G. Carbonell

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our framework using three standard benchmark datasets the mnist dataset, the rectangles images dataset and the convex dataset (Bergstra & Bengio, 2012). We started by training nonparametric networks. Through preliminary experiments, we determined a good starting angular step size for all datasets. We chose to start with αφ = 30 and repeatedly divided αφ by 3 when the validation error stopped improving. By varying the random seed, we trained 10 nets each for several values of the regularization parameter λ per dataset and then chose a typical representative from among those 10 trained nets. Results are shown in black in figure 2.
Researcher Affiliation Academia George Philipp, Jaime G. Carbonell Carnegie Mellon University Pittsburgh, PA 15213, USA george.philipp@email.de; jgc@cs.cmu.edu
Pseudocode Yes Algorithm 1: Ada Rad with ℓ2 fan-in regularizer and the unit addition / removal scheme used in this paper in its most instructive (bot not fastest) order of computation.
Open Source Code No The paper does not include an explicit statement or link for open-source code related to the described methodology.
Open Datasets Yes We evaluated our framework using three standard benchmark datasets the mnist dataset, the rectangles images dataset and the convex dataset (Bergstra & Bengio, 2012). ... This was the poker dataset http://www.openml.org/d/354.
Dataset Splits Yes train-valid split (MNIST) 50.000 10.000 train-valid split (rectangles images) 10.000 2.000 train-valid split (convex) 7.000 1.000 train-valid-test split (poker) 800.000 125.010 100.000
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes Table 3: Hyperparameters and related choices. ... number of hidden layers (not poker) 2 ... αr: radial step size for Ada Rad (not poker) 1 50λ ... ν: unit addition rate for Ada Rad 1 ... batch size 1000