AdaNet: Adaptive Structural Learning of Artificial Neural Networks
Authors: Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report the results of large-scale experiments with one of our algorithms on several binary classification tasks extracted from the CIFAR-10 dataset and on the Criteo dataset. |
| Researcher Affiliation | Collaboration | 1Google Research, New York, NY, USA; 2Courant Institute of Mathematical Sciences, New York, NY, USA. |
| Pseudocode | Yes | Figure 3. Pseudocode of the ADANET algorithm. On line 3 two candidate subnetworks are generated (e.g. randomly or by solving (6)). On lines 3 and 4, (5) is solved for each of these candidates. On lines 5-7 the best subnetwork is selected and on lines 9-11 termination condition is checked. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it mention a specific repository link or explicit code release statement. |
| Open Datasets | Yes | In our first set of experiments, we used the CIFAR-10 dataset (Krizhevsky, 2009)... We also compared ADANET to NN on the Criteo Click Rate Prediction dataset (https://www.kaggle.com/c/criteo-display-ad-challenge). |
| Dataset Splits | Yes | In each of the experiments, we used standard 10-fold crossvalidation for performance evaluation and model selection. In particular, the dataset was randomly partitioned into 10 folds, and each algorithm was run 10 times, with a different assignment of folds to the training set, validation set and test set for each run. Specifically, for each i {0, . . . , 9}, fold i was used for testing, fold i + 1 (mod 10) was used for validation, and the remaining folds were used for training. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components and methods like 'Re Lu', 'stochastic gradient method', 'Adam', and 'Gaussian process bandits', but does not provide specific version numbers for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | Our algorithm admits a number of hyperparameters: regularization hyperparameters λ, β, number of units B in each layer of new subnetworks that are used to extend the model at each iteration, and a bound Λk on weights u in each unit. These hyperparamers have been optimized over the following ranges: λ {0, 10−8, 10−7, 10−6, 10−5, 10−4}, B {100, 150, 250}, η {10−4, 10−3, 10−2, 10−1}. We have used a single Λk for all k > 1 optimized over {1.0, 1.005, 1.01, 1.1, 1.2}. For simplicity, we chose β = 0. Neural network models also admit a learning rate η and a regularization coefficient λ as hyperparameters, as well as the number of hidden layers l and the number of units n in each hidden layer. The range of η was the same as for ADANET and we varied l in {1, 2, 3}, n in {100, 150, 512, 1024, 2048} and λ {0, 10−5, 10−4, 10−3, 10−2, 10−1}. NN, NN-GP and LR are trained using stochastic gradient method with batch size of 100 and maximum of 10,000 iterations. The same configuration is used for solving (6). We use T = 30 for ADANET in all our experiments although in most cases algorithm terminates after 10 rounds. |