Efficient Proximal Mapping of the 1-path-norm of Shallow Networks

Authors: Fabian Latorre, Paul Rolland, Nadav Hallak, Volkan Cevher

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In section 7, we present numerical evidence that our approach (i) converges faster and to lower values of the objective function, compared to plain SGD; (ii) generates sparse iterates; and, (iii) the magnitude of the regularization parameter of the 1-path-norm allows a better accuracy-robustness trade-off than the common ℓ1 regularization or constraints on layer-wise matrix norms.Our benchmarks are the MNIST (Le Cun & Cortes, 2010), Fashion-MNIST (Xiao et al., 2017) and Kuzushiji-MNIST (Clanuwat et al., 2018).
Researcher Affiliation Academia 1Laboratory for Information and Inference Systems (LIONS), EPFL, Switzerland. Correspondence to: Fabian Latorre <fabian.latorre@epfl.ch>.
Pseudocode Yes Algorithm 1 Prox-Grad Method, Algorithm 2 Single-output robust-sparse proximal mapping, Algorithm 3 Multi-output robust-sparse proximal mapping.
Open Source Code No The paper mentions using PyTorch and TensorFlow, but does not provide any statement or link indicating that the source code for their specific methodology is openly available or will be released.
Open Datasets Yes Our benchmarks are the MNIST (Le Cun & Cortes, 2010), Fashion-MNIST (Xiao et al., 2017) and Kuzushiji-MNIST (Clanuwat et al., 2018).
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about a validation set or specific split percentages for training, validation, and test data. It only explicitly mentions 'test set'.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the use of 'Py Torch (Paszke et al., 2019) or Tensor Flow (Abadi et al., 2015)' but does not provide specific version numbers for these or any other software dependencies required for reproducibility.
Experiment Setup Yes For a wide range of learning rates, number of hidden neurons and regularization parameters λ, we train networks with SGD and Proximal-SGD (with constant learning rate). We do so for 20 epochs and with batch size set to 100.