On the Expected Complexity of Maxout Networks

Authors: Hanna Tseran, Guido F. Montufar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate different parameter initialization procedures and show that they can increase the speed of convergence in training.
Researcher Affiliation Academia Hanna Tseran Max Planck Institute for Mathematics in the Sciences 04103 Leipzig, Germany hanna.tseran@mis.mpg.de; Guido Montúfar Department of Mathematics and Department of Statistics, UCLA Los Angeles, CA 90095, USA; Max Planck Institute for Mathematics in the Sciences 04103 Leipzig, Germany montufar@math.ucla.edu
Pseudocode Yes Algorithm for counting activation regions Several approaches for counting linear regions of Re LU networks have been considered (e.g., Serra et al., 2018; Hanin and Rolnick, 2019b; Serra and Ramalingam, 2020; Xiong et al., 2020). For maxout networks we count the activation regions and pieces of the decision boundary by iterative addition of linear inequality constraints and feasibility verification using linear programming. Pseudocode and complexity analysis are provided in Appendix I.
Open Source Code Yes The computer implementation of the key functions is available on Git Hub at https://github.com/ hanna-tseran/maxout_complexity.
Open Datasets Yes We consider the 10 class classification task with the MNIST dataset (Le Cun et al., 2010)
Dataset Splits Yes We use the standard train/validation/test split of 50000/10000/10000, respectively.
Hardware Specification Yes All experiments were run on an NVIDIA RTX A6000 GPU or on the Max Planck Computing and Data Facility (MPCDF) cluster.
Software Dependencies No The paper mentions general software like PyTorch, NumPy, SciPy, Matplotlib, and Python but does not provide specific version numbers for any of these dependencies required for reproduction.
Experiment Setup Yes We train our networks on the MNIST dataset, with batch size 100, for 100 epochs, using Adam optimizer with learning rate 0.001.