Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks
Authors: David A. R. Robin, Kevin Scaman, Marc Lelarge
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We include in this section several preliminary experiments to evaluate empirical predictions based on this theory. and N.2 SPARSE VERSUS DENSE LIFTS IN CLASSIFICATION In the context of classification, for the MNIST digit-recognition dataset, we perform experiments with dense and sparse multi-layer perceptrons to check whether there exists a fundamental difference which would prevent the extension of this theory to the dense setting. |
| Researcher Affiliation | Academia | David A. R. Robin INRIA ENS Paris PSL Research University Kevin Scaman INRIA ENS Paris PSL Research University Marc Lelarge INRIA ENS Paris PSL Research University |
| Pseudocode | No | The paper describes methods and architectures verbally and with diagrams, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statements about the release of source code or links to code repositories for the methodology described. |
| Open Datasets | Yes | In the context of classification, for the MNIST digit-recognition dataset, we perform experiments... and We consider the one-dimensional regression task of learning the target function f : R R... |
| Dataset Splits | No | The training set consists of n = 10^4 input points independently sampled uniformly at random from the interval [0, 100], together with the corresponding value for f . We use the quadratic loss on R and the models described in the following section. We train each model for 10^5 iterations with training samples grouped by batches of 10, taken uniformly at random in the training set with replacement, with the Adam optimizer for a step size of 10^−2. For MNIST, it mentions 'measure the test accuracy of the resulting models after 10^5 training iterations' but not split details. |
| Hardware Specification | No | The paper describes experimental setups but does not specify any hardware components (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers and activation functions but does not specify versions for any software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We train each model for 10^5 iterations with training samples grouped by batches of 10, taken uniformly at random in the training set with replacement, with the Adam optimizer for a step size of 10^−2. and We train several models on the MNIST digit recognition dataset, using the cross-entropy loss and the Adam optimizer with a step-size of 10^−3 by batches of 100 samples taken with replacement... and We initialize each weight matrix W R^n×m with independent identically distributed entries, with a gaussian distribution of mean zero and variance 1/n, also known as Kaiming He’s normal initialization with fan-in mode , and biases with normal distributions. |