Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

Authors: Manon Verbockhaven, Théo Rudkiewicz, Sylvain Chevallier, Guillaume Charpiat

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a proof of concept, we show results on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search.
Researcher Affiliation Academia Manon Verbockhaven, Théo Rudkiewicz, Sylvain Chevallier, Guillaume Charpiat TAU team, LISN, Université Paris-Saclay, CNRS, Inria, 91405, Orsay, France EMAIL
Pseudocode Yes Algorithm 1: Algorithm to plot Figures 5 and 8. 1 for each method [TINY, Method To Compare With] do 2 Start from neural network N with initial structure s {1/4, 1/64}; 3 while N architecture does not match Res Net18 width do 4 for d in {depths to grow} do 5 θK = New Neurons(d, method) ; 6 Normalize θK according to E.3; 7 Add the neurons at layer d ; 8 Train N for t epochs ; 9 Save model N and its performance ;
Open Source Code Yes The code is available at https://gitlab.inria.fr/mverbock/tinypub.
Open Datasets Yes As a proof of concept, we show results on the CIFAR dataset, matching large neural network accuracy, with competitive training time, while removing the need for standard architectural hyper-parameter search.
Dataset Splits No Once the models have reached the final architecture Res Net18, they are trained for 250 more epochs (or 500 epochs if they have not converged yet) on the training set. We have summarized the final performance in Table 1.
Hardware Specification No The experiments were performed on 1 GPU.
Software Dependencies Yes Numerical computation was enabled by the scientific Python ecosystem: Matplotlib Hunter (2007), Numpy Harris et al. (2020), Scipy Virtanen et al. (2020), pandas pandas development team (2020), Py Torch Paszke et al. (2019).
Experiment Setup Yes The optimizer is SGD(lr = 1e 2) with the starting batch size 32 E.2. At each depth l we set the number nl of neurons to be added at this depth 2. These numbers do not depend on the starting architecture and have been chosen such that each depth will reach its final width with the same number of layer extensions. For the initial structure s = 1/4, resp. 1/64, we set the number of layer extensions to 16, resp. 21, such that at depth 2 (named Conv2 in Table 3), n2 = (Sizefinal 2 Sizestart 2 )/nb of layer extensions = (64 16)/16 = (64 1)/21 = 3. The initial architecture is described in Table 3.