Learning with Pseudo-Ensembles

Authors: Philip Bachman, Ouais Alsharif, Doina Precup

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested PEA regularization in three scenarios: supervised learning on MNIST digits, semi-supervised learning on MNIST digits, and semi-supervised transfer learning on a dataset from the NIPS 2011 Workshop on Challenges in Learning Hierarchical Models [13]. Full implementations of our methods, written with THEANO [3], and scripts/instructions for reproducing all of the results in this section are available online at: http://github.com/Philip-Bachman/Pseudo-Ensembles.
Researcher Affiliation Academia Philip Bachman Mc Gill University Montreal, QC, Canada phil.bachman@gmail.com Ouais Alsharif Mc Gill University Montreal, QC, Canada ouais.alsharif@gmail.com Doina Precup Mc Gill University Montreal, QC, Canada dprecup@cs.mcgill.ca
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes Full implementations of our methods, written with THEANO [3], and scripts/instructions for reproducing all of the results in this section are available online at: http://github.com/Philip-Bachman/Pseudo-Ensembles. All code required for these experiments is publicly available online.
Open Datasets Yes The MNIST dataset comprises 60k 28x28 grayscale hand-written digit images for training and 10k images for testing. The labeled data source was CIFAR-100 [11], which contains 50k 32x32 color images in 100 classes. The unlabeled data source was a collection of 100k 32x32 color images taken from Tiny Images [11]. We now show how the Recursive Neural Tensor Network (RNTN) from [19] can be adapted using pseudo-ensembles, and evaluate it on the Stanford Sentiment Treebank (STB) task.
Dataset Splits No The paper describes splitting training samples into labeled/unlabeled subsets and testing on a separate test set, but it does not explicitly provide details about a distinct validation set split (e.g., specific sizes, percentages, or method for creating a validation set) for model tuning or early stopping.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models. It only mentions that the methods are written with THEANO.
Software Dependencies No The paper mentions "THEANO [3]" but does not specify a version number for THEANO or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For the supervised tests we used SGD hyperparameters roughly following those in [9]. We trained networks with two hidden layers of 800 nodes each, using rectified-linear activations and an ℓ2-norm constraint of 3.5 on incoming weights for each node. We initialized hidden layer biases to 0.1, output layer biases to 0, and inter-layer weights to zero-mean Gaussian noise with σ = 0.01. We trained all networks for 1000 epochs with no early-stopping.