A Spline Theory of Deep Learning

Authors: Randall Balestriero, baraniuk

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now empirically demonstrate that orthogonal templates lead to significantly improved classification performance. We conducted a range of experiments with three different conventional DN architectures small CNN, large CNN, and Res Net4-4 trained on three different datasets SVHN, CIFAR10, and CIFAR100.
Researcher Affiliation Academia Randall Balestriero 1 Richard G. Baraniuk 1 1ECE Department, Rice University, Houston, TX, USA. Correspondence to: Randall B. <randallbalestriero@gmail.com>.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide a direct link to open-source code for the methodology described, nor does it explicitly state that the code is released or available in supplementary materials.
Open Datasets Yes We conducted a range of experiments with three different conventional DN architectures small CNN, large CNN, and Res Net4-4 trained on three different datasets SVHN, CIFAR10, and CIFAR100.
Dataset Splits No The paper mentions using 'standard test data sets like CIFAR100' and 'average performance and standard deviation' over runs, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes For learning, we used the Adam optimizer with an exponential learning rate decay. All inputs were centered to zero mean and scaled to a maximum value of one. ... Each DN employed bias units, Re LU activations, and max-pooling as well as batch-normalization prior each Re LU.