Residual Networks Behave Like Ensembles of Relatively Shallow Networks
Authors: Andreas Veit, Michael J. Wilber, Serge Belongie
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All experiments are performed at test time on CIFAR-10 [12]. Experiments on Image Net [2] show comparable results. We train residual networks with the standard training strategy, dataset augmentation, and learning rate policy, [6]. |
| Researcher Affiliation | Academia | Andreas Veit Michael Wilber Serge Belongie Department of Computer Science & Cornell Tech Cornell University {av443, mjw285, sjb344}@cornell.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (link, explicit statement of release, or mention in supplementary materials) to source code for the methodology described. |
| Open Datasets | Yes | All experiments are performed at test time on CIFAR-10 [12]. Experiments on Image Net [2] show comparable results. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and ImageNet but does not explicitly provide specific dataset split information for training, validation, or test sets (percentages, sample counts, or citations to predefined splits) beyond implicitly using standard datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions using 'standard training strategy, dataset augmentation, and learning rate policy' and the number of layers/modules for networks (e.g., '110-layer (54-module) residual network'), but it does not provide concrete hyperparameter values or detailed training configurations like specific learning rates, batch sizes, or optimizer settings. |