Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally
Authors: Manon Verbockhaven, Théo Rudkiewicz, Sylvain Chevallier, Guillaume Charpiat
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a proof of concept, we show results on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search. |
| Researcher Affiliation | Academia | Manon Verbockhaven, Théo Rudkiewicz, Sylvain Chevallier, Guillaume Charpiat TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405, Orsay, France EMAIL |
| Pseudocode | Yes | Algorithm 1: Algorithm to plot Figures 5 and 8. 1 for each method [TINY, Method To Compare With] do 2 Start from neural network N with initial structure s {1/4, 1/64}; 3 while N architecture does not match Res Net18 width do 4 for d in {depths to grow} do 5 θK = New Neurons(d, method) ; 6 Normalize θK according to E.3; 7 Add the neurons at layer d ; 8 Train N for t epochs ; 9 Save model N and its performance ; |
| Open Source Code | Yes | The code is available at https://gitlab.inria.fr/mverbock/tinypub. |
| Open Datasets | Yes | As a proof of concept, we show results on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search. |
| Dataset Splits | No | Once the models have reached the final architecture Res Net18, they are trained for 250 more epochs (or 500 epochs if they have not converged yet) on the training set. We have summarized the final performance in Table 1. |
| Hardware Specification | No | The experiments were performed on 1 GPU. |
| Software Dependencies | Yes | Numerical computation was enabled by the scientific Python ecosystem: Matplotlib Hunter (2007), Numpy Harris et al. (2020), Scipy Virtanen et al. (2020), pandas pandas development team (2020), Py Torch Paszke et al. (2019). |
| Experiment Setup | Yes | The optimizer is SGD(lr = 1e 2) with the starting batch size 32 E.2. At each depth l we set the number nl of neurons to be added at this depth 2. These numbers do not depend on the starting architecture and have been chosen such that each depth will reach its final width with the same number of layer extensions. For the initial structure s = 1/4, resp. 1/64, we set the number of layer extensions to 16, resp. 21, such that at depth 2 (named Conv2 in Table 3), n2 = (Sizefinal 2 Sizestart 2 )/nb of layer extensions = (64 16)/16 = (64 1)/21 = 3. The initial architecture is described in Table 3. |