On the Expected Complexity of Maxout Networks
Authors: Hanna Tseran, Guido F. Montufar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate different parameter initialization procedures and show that they can increase the speed of convergence in training. |
| Researcher Affiliation | Academia | Hanna Tseran Max Planck Institute for Mathematics in the Sciences 04103 Leipzig, Germany hanna.tseran@mis.mpg.de; Guido Montúfar Department of Mathematics and Department of Statistics, UCLA Los Angeles, CA 90095, USA; Max Planck Institute for Mathematics in the Sciences 04103 Leipzig, Germany montufar@math.ucla.edu |
| Pseudocode | Yes | Algorithm for counting activation regions Several approaches for counting linear regions of Re LU networks have been considered (e.g., Serra et al., 2018; Hanin and Rolnick, 2019b; Serra and Ramalingam, 2020; Xiong et al., 2020). For maxout networks we count the activation regions and pieces of the decision boundary by iterative addition of linear inequality constraints and feasibility verification using linear programming. Pseudocode and complexity analysis are provided in Appendix I. |
| Open Source Code | Yes | The computer implementation of the key functions is available on Git Hub at https://github.com/ hanna-tseran/maxout_complexity. |
| Open Datasets | Yes | We consider the 10 class classification task with the MNIST dataset (Le Cun et al., 2010) |
| Dataset Splits | Yes | We use the standard train/validation/test split of 50000/10000/10000, respectively. |
| Hardware Specification | Yes | All experiments were run on an NVIDIA RTX A6000 GPU or on the Max Planck Computing and Data Facility (MPCDF) cluster. |
| Software Dependencies | No | The paper mentions general software like PyTorch, NumPy, SciPy, Matplotlib, and Python but does not provide specific version numbers for any of these dependencies required for reproduction. |
| Experiment Setup | Yes | We train our networks on the MNIST dataset, with batch size 100, for 100 epochs, using Adam optimizer with learning rate 0.001. |