A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

Authors: Michael S Matena, Colin A. Raffel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran experiments comparing batch gradient descent to solving (3) for a randomly chosen vertex on some toy datasets. We present some of our results in fig. 2.
Researcher Affiliation Academia Michael Matena Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599 mmatena@cs.unc.edu Colin Raffel Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599 craffel@cs.unc.edu
Pseudocode Yes Algorithm 1 Exact ERM (Arora et al., 2016) ... Algorithm 2 Greedy Local Search (GLS) Heuristic
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We have included this in the supplemental material.
Open Datasets Yes We also created toy binary classification datasets from MNIST (Le Cun et al., 2010) and Fashion MNIST (Xiao et al., 2017)
Dataset Splits No The paper mentions 'training sets' but does not explicitly provide details about train/validation/test splits or mention a 'validation' set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or detailed cluster specifications) used for running experiments. The authors' ethics statement acknowledges: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No]'
Software Dependencies No The paper mentions software libraries like 'CVXPY', 'ECOS', 'scikit-learn', 'PyTorch', and 'NumPy', but does not specify their version numbers.
Experiment Setup Yes See appendix H for details of the training procedures and for results on more d, mgen and d, N pairs. ... We used a batch size of 128 and a learning rate of 10−3. We trained for 1000 epochs using the Adam optimizer.