reproducibilityindex.ai

AdaNet: Adaptive Structural Learning of Artificial Neural Networks

Authors: Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report the results of large-scale experiments with one of our algorithms on several binary classiﬁcation tasks extracted from the CIFAR-10 dataset and on the Criteo dataset.
Researcher Affiliation	Collaboration	1Google Research, New York, NY, USA; 2Courant Institute of Mathematical Sciences, New York, NY, USA.
Pseudocode	Yes	Figure 3. Pseudocode of the ADANET algorithm. On line 3 two candidate subnetworks are generated (e.g. randomly or by solving (6)). On lines 3 and 4, (5) is solved for each of these candidates. On lines 5-7 the best subnetwork is selected and on lines 9-11 termination condition is checked.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it mention a specific repository link or explicit code release statement.
Open Datasets	Yes	In our ﬁrst set of experiments, we used the CIFAR-10 dataset (Krizhevsky, 2009)... We also compared ADANET to NN on the Criteo Click Rate Prediction dataset (https://www.kaggle.com/c/criteo-display-ad-challenge).
Dataset Splits	Yes	In each of the experiments, we used standard 10-fold crossvalidation for performance evaluation and model selection. In particular, the dataset was randomly partitioned into 10 folds, and each algorithm was run 10 times, with a different assignment of folds to the training set, validation set and test set for each run. Speciﬁcally, for each i {0, . . . , 9}, fold i was used for testing, fold i + 1 (mod 10) was used for validation, and the remaining folds were used for training.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software components and methods like 'Re Lu', 'stochastic gradient method', 'Adam', and 'Gaussian process bandits', but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	Our algorithm admits a number of hyperparameters: regularization hyperparameters λ, β, number of units B in each layer of new subnetworks that are used to extend the model at each iteration, and a bound Λk on weights u in each unit. These hyperparamers have been optimized over the following ranges: λ {0, 10−8, 10−7, 10−6, 10−5, 10−4}, B {100, 150, 250}, η {10−4, 10−3, 10−2, 10−1}. We have used a single Λk for all k > 1 optimized over {1.0, 1.005, 1.01, 1.1, 1.2}. For simplicity, we chose β = 0. Neural network models also admit a learning rate η and a regularization coefﬁcient λ as hyperparameters, as well as the number of hidden layers l and the number of units n in each hidden layer. The range of η was the same as for ADANET and we varied l in {1, 2, 3}, n in {100, 150, 512, 1024, 2048} and λ {0, 10−5, 10−4, 10−3, 10−2, 10−1}. NN, NN-GP and LR are trained using stochastic gradient method with batch size of 100 and maximum of 10,000 iterations. The same conﬁguration is used for solving (6). We use T = 30 for ADANET in all our experiments although in most cases algorithm terminates after 10 rounds.