Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers

Authors: Masoumeh Soflaei, Hongyu Guo, Ali Al-Bashabsheh, Yongyi Mao, Richong Zhang5810-5817

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments, applying Agr Learn to the current art of deep learning architectures for image and text classification. Our experimental results suggest that Agr Learn brings significant gain in classification accuracy.
Researcher Affiliation Academia 1University of Ottawa, Ottawa, Canada, 2National Research Council Canada, 3Beijing Advanced Institution on Big Data and Brain Computing, Beihang University, Beijing, China
Pseudocode Yes Algorithm 1 Training in n-fold Agr Learn
Open Source Code Yes Our implementation of Agr Learn is available at https://github.com/SITE5039/Agr Learn
Open Datasets Yes Experiments are conducted on the CIFAR-10, CIFAR-100 datasets with two widely used deep network architectures, namely Res Net...using two benchmark sentenceclassification datasets, Movie Review (Pang and Lee 2005) and Subjectivity (Pang and Lee 2004).
Dataset Splits No The paper specifies training and test splits (e.g., '50,000 training images, 10,000 test images' for CIFAR-10; '10% of random examples in each dataset for testing and the rest for training' for text classification) but does not explicitly mention or detail a separate validation split or set aside for hyperparameter tuning.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running experiments, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions frameworks like Res Net, Wide Res Net, CNN, LSTM, and references TensorFlow (Abadi et al. 2016) and specific implementations (Liu 2017, Zagoruyko and Komodakis 2016a), but it does not specify any version numbers for these software components or libraries.
Experiment Setup Yes We use mini-batched backprop for 400 epochs with exactly the same hyper-parameter settings without dropout. Specifically, weight decay is 10^-4, and each mini-batch contains 64 aggregated training examples. The learning rate for the main network is set to 0.1 initially and decays by a factor of 10 after 100, 150, and 250 epochs.