Universum Prescription: Regularization Using Unlabeled Data
Authors: Xiang Zhang, Yann LeCun
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper shows that simply prescribing none of the above labels to unlabeled data has a beneficial regularization effect to supervised learning. We call it universum prescription by the fact that the prescribed labels cannot be one of the supervised labels. In spite of its simplicity, universum prescription obtained competitive results in training deep convolutional networks for CIFAR-10, CIFAR-100, STL-10 and Image Net datasets. |
| Researcher Affiliation | Academia | Xiang Zhang, Yann Le Cun Courant Institute of Mathematical Sciences, New York University 719 Broadway, 12th Floor, New York, NY 10003 {xiang, yann}@cs.nyu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methods are described in narrative text. |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., a specific repository link or an explicit statement about code release in supplementary materials) for the methodology described. |
| Open Datasets | Yes | Experiments on Image Classification In this section we test the methods on some image classification tasks. Three series of datasets CIFAR-10/100 (Krizhevsky 2009), STL-10 (Coates, Ng, and Lee 2011) and Image Net (Russakovsky et al. 2015) are chosen due to the availability of unlabeled data. |
| Dataset Splits | Yes | The Image Net dataset (Russakovsky et al. 2015) for classification task has in total 1,281,167 training images and 50,000 validation images. The reported testing errors are evaluated on this validation dataset. |
| Hardware Specification | Yes | We gratefully acknowledge NVIDIA Corporation with the donation of 2 Tesla K40 GPUs used for this research. |
| Software Dependencies | No | The paper mentions the use of deep learning frameworks and algorithms but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific libraries like scikit-learn with versions). |
| Experiment Setup | Yes | The algorithm used is stochastic gradient descent with momentum (Polyak 1964) (Sutskever et al. 2013) 0.9 and a minibatch size of 32. The initial learning rate is 0.005 which is halved every 60,000 minibatch steps for CIFAR-10/100 and every 600,000 minibatch steps for Image Net. The training stops at 400,000 steps for CIFAR-10/100 and STL10, and 2,500,000 steps for Image Net. Two dropout (Srivastava et al. 2014) layers of probability 0.5 are inserted before the final two linear layers. |