A Theoretical View on Sparsely Activated Networks

Authors: Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To support our theory, we perform experiments in Section 5 on approximating Lipschitz functions with sparse networks. We identify several synthetic datasets where models with data-dependent sparse layers outperform dense models of the same size. Moreover, we achieve these results with relatively small networks.
Researcher Affiliation Industry Cenk Baykal Google Research Nishanth Dikkala Google Research Rina Panigrahy Google Research Cyrus Rashtchian Google Research Xin Wang Google Research
Pseudocode No The paper describes models and algorithms in text and mathematical formulations but does not include structured pseudocode blocks or algorithm listings.
Open Source Code Yes We include details to reproduce all datasets and experiments.
Open Datasets Yes On CIFAR-10, we also see that the DSM model performs comparably or better than the dense network.
Dataset Splits No The paper mentions training on CIFAR-10 and evaluating on the test dataset, but it does not specify the train/validation/test split percentages or sample counts.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, or specific cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using ADAM optimizer, but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Both models are trained with ADAM optimizer for 50 epochs and evaluated on the test dataset for model accuracy with no data augmentation