reproducibilityindex.ai

Scaling Convex Neural Networks with Burer-Monteiro Factorization

Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, John M. Pauly, Morteza Mardani, Mert Pilanci

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with image classification task indicate that this BM factorization enables layerwise training of convex CNNs, which allows for convex networks for the first time to match the performance of multi-layer end-to-end trained non-convex CNNs. and 4 EXPERIMENTAL RESULTS
Researcher Affiliation	Collaboration	Arda Sahiner Arcus Inc. Stanford University Tolga Ergen LG AI Research Batu Ozturkler Stanford University John Pauly Stanford University Morteza Mardani NVIDIA Corporation Mert Pilanci Stanford University
Pseudocode	No	The paper describes methods and processes through mathematical formulations and prose, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, such as a specific repository link, an explicit code release statement, or code in supplementary materials.
Open Datasets	Yes	We apply this procedure to the CIFAR-10 (Krizhevsky et al., 2009) and Fashion-MNIST (Xiao et al., 2017) datasets
Dataset Splits	No	The paper mentions 'test accuracy' and refers to 'training' on CIFAR-10 and Fashion-MNIST, but does not explicitly provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification	Yes	Our layerwise training procedure was trained on a single NVIDIA 1080 Ti GPU
Software Dependencies	No	The paper mentions 'Pytorch (Paszke et al., 2019)' but does not provide specific version numbers for Pytorch or any other ancillary software components used in the experiments.
Experiment Setup	Yes	In our experiments, we keep all network and optimization parameters the same, aside from replacing the non-convex CNN at each stage with our convex CNN objective (23). We then apply the Burer-Monteiro factorization with m [1, 2, 4] to this architecture to make it tractable for layerwise learning as described in the main paper. At each stage, we randomly subsample ˆP = 256 hyperplane arrangements. We further use gated Re LU rather than Re LU activations for simplicity, which can work as well as Re LU in practice (Fiat et al., 2019). ...a batch size of 128, weight decay parameter of β = 5e 4, along with stochastic gradient descent (SGD) with momentum fixed to 0.9, 50 epochs per stage, and learning rate decay by a factor of 0.2 every 15 epochs. ...The chosen learning rates were [10 1, 10 2, 10 3, 10 2, 10 2] for CIFAR-10. For Fashion-MNIST, we empirically observed the training loss was better optimized with slightly higher learning rates, so we used [2 10 1, 5 10 2, 5 10 3].