reproducibilityindex.ai

Expand-and-Cluster: Parameter Recovery of Neural Networks

Authors: Flavio Martinelli, Berfin Simsek, Wulfram Gerstner, Johanni Brea

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate successful weights and size recovery of trained shallow and deep networks with less than 10% overhead in the layer size and describe an ease-of-identifiability axis by analysing 150 synthetic problems of variable difficulty.
Researcher Affiliation	Academia	1Department of Life Sciences and Computer Sciences, EPFL, Lausanne, Switzerland. 2Center for Data Science, NYU, New York, United States.
Pseudocode	Yes	Algorithm: Expand-and-Cluster Input: Unknown network s , L layers, activation Train N overparameterized student networks on for in L do collect weight vectors from all N students Compute L2 pairwise distances Dendrogram tree hierarchical clustering on Cut tree to maximise #clusters of size Remove small clusters ( size ) Remove clusters of median within-cluster angle Retrieve set of remaining clusters into hidden neurons end for = reconstruct output layer ﬁnetune all weights with Output: Network parameters , hidden layer sizes Identify suitable expansion with quick training runs Train the N overparameterized student networks on with reconstructed weights Overparameterized Figure 3. Parameter identification with Expand-and-Cluster. A) Training scheme: once an overparameterisation factor yields near-zero training losses, train N overparameterised students on the teacher-generated dataset D(X, y); B) Similarity matrix: L2distance between hidden neurons input weight vectors of layer l for all N students. Large-sized clusters are good candidate weight vectors. C) Dendrogram obtained with hierarchical clustering: the selected linkage threshold is shown in orange. Clusters are eliminated if too small (blue) or unaligned (red), the remaining clusters are shown in green. The code is available at https: //github.com/flavio-martinelli/expand-and-cluster.
Open Source Code	Yes	The code is available at https://github.com/flavio-martinelli/expand-and-cluster.
Open Datasets	Yes	To show how the procedure scales to bigger applications, we recover parameters of networks trained on the MNIST (Le Cun, 1998), Fashion MNIST (Xiao et al., 2017) and CIFAR10 (Krizhevsky et al., 2009) datasets.
Dataset Splits	No	The paper does not explicitly provide details about train/validation/test dataset splits with percentages or sample counts for reproducibility.
Hardware Specification	Yes	All of the toy model networks are trained with Float64 precision on CPU machines (Intel Xeon Gold 6132 on Linux machines). A maximum of 25k epochs was allocated to train these students on GPU machines (NVIDIA Tesla V100 32G).
Software Dependencies	No	The paper mentions software packages and algorithms like MLPGradient Flow.jl, Adam optimiser, ODE solver Ken Carp58, Newton Trust Region, BFGS, and LD SLSQP, but does not provide specific version numbers for them.
Experiment Setup	Yes	Students were initialised following the Glorot normal distribution, mean 0 and std = q 2 fan in+fan out (Glorot & Bengio, 2010). We allocated a fixed amount of iteration steps per student: 5000 steps of the ODE solver Ken Carp58 for all networks, plus an additional 5000 steps of exact second order method Newton Trust Region for non-overparameterised networks (ρ = 1) or 250 steps of BFGS for overparameterised networks (ρ 2). The stopping criteria for the second training phase were: mean square error loss 10 31 or gradient norm L(θ(t)) 10 16. The training was performed with the Adam optimiser on mini-batches of size 640 with an adaptive learning rate scheduler that reduces the learning rate after more than 100 epochs of non-decreasing training loss. A maximum of 25k epochs was allocated to train these students on GPU machines (NVIDIA Tesla V100 32G). Shallow synthetic teachers: all the procedure was performed with N = 10 (for r = 8) or N = 20 (for r = 2, 4), γ = 0.8 and β = π/24.