DNNs as Layers of Cooperating Classifiers

Authors: Marelie Davel, Marthinus Theunissen, Arnold Pretorius, Etienne Barnard3725-3732

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments in a relatively simple setup. Our aim is to understand trends, while retaining the key elements that are likely to be common to high-performance DNNs. Thus, we use only fully-connected feedforward networks with highly regular topologies, and investigate their behavior on two widely-used image-recognition tasks, namely MNIST (Lecun et al. 1998) and FMNIST (Xiao, Rasul, and Vollgraf 2017).
Researcher Affiliation Academia Marelie H. Davel, Marthinus W. Theunissen, Arnold M. Pretorius, Etienne Barnard Multilingual Speech Technologies, North-West University, South Africa; and CAIR, South Africa.
Pseudocode No The paper describes mathematical derivations and processes (e.g., Equation 10) but does not present them in a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes Thus, we use only fully-connected feedforward networks with highly regular topologies, and investigate their behavior on two widely-used image-recognition tasks, namely MNIST (Lecun et al. 1998) and FMNIST (Xiao, Rasul, and Vollgraf 2017).
Dataset Splits No The paper states 'We implement early stopping by choosing networks with the smallest validation error.' and mentions test set sizes, but does not provide specific percentages or counts for training, validation, and test splits needed for reproduction, nor does it cite a standard split with author/year.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'The popular Adam (Kingma and Ba 2014) optimizer is used' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes All hidden nodes have Rectified Linear Unit (Re LU) activation functions, and a standard mean squared error (MSE) loss function is employed, unless stated otherwise. The popular Adam (Kingma and Ba 2014) optimizer is used to train the networks after normalized uniform initialization with three different training seeds (Le Cun et al. 2012), and the global learning rates are manually adjusted to ensure training set convergence. Our first analysis investigates several networks of fixed width and increasing depth. Depth here refers to the number of hidden layers, without counting the input or output layers. For a width of 100 nodes per layer, both the MNIST and FMNIST systems initially achieve decreasing error rates as the number of hidden layers grows, but the performance quickly saturates. In the second analysis, network depth is kept constant at 10 layers, and the width (number of nodes per layer) is adjusted.