High-Dimensional Optimization in Adaptive Random Subspaces

Authors: Jonathan Lacotte, Mert Pilanci, Marco Pavone

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that the proposed approach offers significant speed ups in machine learning problems including logistic regression, kernel classification with random convolution layers and shallow neural networks with rectified linear units. ... 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. ... We evaluate Algorithm 1 on MNIST and CIFAR10. First, we aim to show that the sketching dimension can be considerably smaller than the original dimension while retaining (almost) the same test classification accuracy. Second, we aim to get significant speed-ups in achieving a high-accuracy classifier.
Researcher Affiliation Academia Jonathan Lacotte Department of Electrical Engineering Stanford University lacotte@stanford.edu Mert Pilanci Department of Electrical Engineering Stanford University Marco Pavone Department of Aeronautics &Astronautics Stanford University
Pseudocode Yes Algorithm 1: Generic algorithm for adaptive sketching. ... Algorithm 2: Iterative adaptive sketching
Open Source Code No The paper does not provide any explicit statement or link to its open-source code for the methodology.
Open Datasets Yes We evaluate Algorithm 1 on MNIST and CIFAR10. ... For both datasets, we use 50000 training and 10000 testing images.
Dataset Splits No The paper specifies 50000 training and 10000 testing images but does not mention a separate validation split or the methodology for creating one.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'scikit-learn: Machine learning in Python.' [18] as a citation, but does not specify software dependencies with version numbers for its own implementation.
Experiment Setup Yes To solve the primal program (1), we use two standard algorithms, stochastic gradient descent (SGD) with (best) fixed step size and stochastic variance reduction gradient (SVRG) [16] with (best) fixed step size and frequency update of the gradient correction. To solve the adaptive sketched program (2), we use SGD, SVRG and the sub-sampled Newton method [4, 10] which we refer to as Sketch-SGD, Sketch-SVRG and Sketch-Newton. ... For MNIST and CIFAR10, we choose respectively D = 10000 and γ = 0.02, and, D = 60000 and γ = 0.002... using a one-vs-all procedure. ... for values of λ {10 4, 5 10 5, 10 5, 5 10 6} and sketching dimensions m {64, 128, 256, 512, 1024}.