Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

Authors: David A. R. Robin, Kevin Scaman, Marc Lelarge

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We include in this section several preliminary experiments to evaluate empirical predictions based on this theory. and N.2 SPARSE VERSUS DENSE LIFTS IN CLASSIFICATION In the context of classification, for the MNIST digit-recognition dataset, we perform experiments with dense and sparse multi-layer perceptrons to check whether there exists a fundamental difference which would prevent the extension of this theory to the dense setting.
Researcher Affiliation Academia David A. R. Robin INRIA ENS Paris PSL Research University Kevin Scaman INRIA ENS Paris PSL Research University Marc Lelarge INRIA ENS Paris PSL Research University
Pseudocode No The paper describes methods and architectures verbally and with diagrams, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statements about the release of source code or links to code repositories for the methodology described.
Open Datasets Yes In the context of classification, for the MNIST digit-recognition dataset, we perform experiments... and We consider the one-dimensional regression task of learning the target function f : R R...
Dataset Splits No The training set consists of n = 10^4 input points independently sampled uniformly at random from the interval [0, 100], together with the corresponding value for f . We use the quadratic loss on R and the models described in the following section. We train each model for 10^5 iterations with training samples grouped by batches of 10, taken uniformly at random in the training set with replacement, with the Adam optimizer for a step size of 10^−2. For MNIST, it mentions 'measure the test accuracy of the resulting models after 10^5 training iterations' but not split details.
Hardware Specification No The paper describes experimental setups but does not specify any hardware components (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers and activation functions but does not specify versions for any software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We train each model for 10^5 iterations with training samples grouped by batches of 10, taken uniformly at random in the training set with replacement, with the Adam optimizer for a step size of 10^−2. and We train several models on the MNIST digit recognition dataset, using the cross-entropy loss and the Adam optimizer with a step-size of 10^−3 by batches of 100 samples taken with replacement... and We initialize each weight matrix W R^n×m with independent identically distributed entries, with a gaussian distribution of mean zero and variance 1/n, also known as Kaiming He’s normal initialization with fan-in mode , and biases with normal distributions.