A Spline Theory of Deep Learning
Authors: Randall Balestriero, baraniuk
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now empirically demonstrate that orthogonal templates lead to significantly improved classification performance. We conducted a range of experiments with three different conventional DN architectures small CNN, large CNN, and Res Net4-4 trained on three different datasets SVHN, CIFAR10, and CIFAR100. |
| Researcher Affiliation | Academia | Randall Balestriero 1 Richard G. Baraniuk 1 1ECE Department, Rice University, Houston, TX, USA. Correspondence to: Randall B. <randallbalestriero@gmail.com>. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide a direct link to open-source code for the methodology described, nor does it explicitly state that the code is released or available in supplementary materials. |
| Open Datasets | Yes | We conducted a range of experiments with three different conventional DN architectures small CNN, large CNN, and Res Net4-4 trained on three different datasets SVHN, CIFAR10, and CIFAR100. |
| Dataset Splits | No | The paper mentions using 'standard test data sets like CIFAR100' and 'average performance and standard deviation' over runs, but does not provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | For learning, we used the Adam optimizer with an exponential learning rate decay. All inputs were centered to zero mean and scaled to a maximum value of one. ... Each DN employed bias units, Re LU activations, and max-pooling as well as batch-normalization prior each Re LU. |