A Theoretical View on Sparsely Activated Networks
Authors: Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To support our theory, we perform experiments in Section 5 on approximating Lipschitz functions with sparse networks. We identify several synthetic datasets where models with data-dependent sparse layers outperform dense models of the same size. Moreover, we achieve these results with relatively small networks. |
| Researcher Affiliation | Industry | Cenk Baykal Google Research Nishanth Dikkala Google Research Rina Panigrahy Google Research Cyrus Rashtchian Google Research Xin Wang Google Research |
| Pseudocode | No | The paper describes models and algorithms in text and mathematical formulations but does not include structured pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | We include details to reproduce all datasets and experiments. |
| Open Datasets | Yes | On CIFAR-10, we also see that the DSM model performs comparably or better than the dense network. |
| Dataset Splits | No | The paper mentions training on CIFAR-10 and evaluating on the test dataset, but it does not specify the train/validation/test split percentages or sample counts. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or specific cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using ADAM optimizer, but does not provide specific software names with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Both models are trained with ADAM optimizer for 50 epochs and evaluated on the test dataset for model accuracy with no data augmentation |