Semi-Supervised Learning with Normalizing Flows
Authors: Pavel Izmailov, Polina Kirichenko, Marc Finzi, Andrew Gordon Wilson
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data, tabular data, and semi-supervised image classification. We also show that Flow GMM can discover interpretable structure, provide real-time optimization-free feature visualizations, and specify well calibrated predictive distributions. 5. Experiments We evaluate Flow GMM on a wide range of datasets across different application domains including low-dimensional synthetic data (Section 5.1), text and tabular data (Section 5.2), and image data (Sections 5.3, 5.4). |
| Researcher Affiliation | Academia | Pavel Izmailov * 1 Polina Kirichenko * 1 Marc Finzi * 1 Andrew Gordon Wilson 1 1New York University. Correspondence to: Pavel Izmailov <pi390@nyu.edu>. |
| Pseudocode | No | The paper describes the Expectation-Maximization algorithm in Appendix A but does not present it as a formal pseudocode or algorithm block. |
| Open Source Code | Yes | We also provide code at https://github.com/izmailovpavel/flowgmm. |
| Open Datasets | Yes | We evaluate Flow GMM on a wide range of datasets across different application domains including low-dimensional synthetic data (Section 5.1), text and tabular data (Section 5.2), and image data (Sections 5.3, 5.4). Along with standard tabular UCI datasets, we also consider text classification on AG-News and Yahoo Answers datasets. We evaluate Flow GMM in transfer learning setting on CIFAR-10 semi-supervised image classification. We next evaluate the proposed method on semi-supervised image classification benchmarks on CIFAR-10, MNIST and SVHN datasets. |
| Dataset Splits | Yes | For each of the datasets, a separate validation set of size 5k was used to tune hyperparameters. We test Flow GMM calibration on MNIST and CIFAR datasets in the supervised setting. On MNIST we restricted the training set size to 1000 objects, since on the full dataset the model makes too few mistakes which makes evaluating calibration harder. In Table 5, we report negative log likelihood and expected calibration error (ECE, see Guo et al. (2017) for a description of this metric). We can see that re-calibrating variances of the Gaussians in the mixture significantly improves both metrics and mitigates overconfidence. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory specifications) used for running its experiments. It discusses model architectures and training setups, but not the underlying hardware. |
| Software Dependencies | No | The paper mentions using specific models/algorithms like "Real NVP normalizing flow architecture", "ADAM optimizer", and "BERT transformer model", but does not provide version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the implementation. |
| Experiment Setup | Yes | Throughout training, Gaussian mixture parameters are fixed: the means are initialized randomly from the standard normal distribution and the covariances are set to I. In all experiments, we use the Real NVP normalizing flow architecture. We use the Real NVP architecture with 5 coupling layers, defined by fully-connected shift and scale networks, each with 1 hidden layer of size 512. The tuned learning rates for each of the models that we used for these experiments are shown in Table 6. We train our Flow GMM model with a Real NVP normalizing flow, similar to the architectures used in Papamakarios et al. (2017). Specifically, the model uses 7 coupling layers, with 1 hidden layer each and 256 hidden units for the UCI datasets but 1024 for text classification. UCI models were trained for 50 epochs of unlabeled data and the text datasets were trained for 200 epochs of unlabeled data. We use Adam optimizer (Kingma & Ba, 2014) with learning rate 10-3 for CIFAR-10 and SVHN and 10-4 for MNIST. We train the supervised model for 100 epochs, and semi-supervised models for 1000 passes through the labeled data for CIFAR-10 and SVHN and 3000 passes for MNIST. We use a batch size of 64 and sample 32 labeled and 32 unlabeled data points in each mini-batch. For the consistency loss term (7), we linearly increase the weight from 0 to 1 for the first 100 epochs following Athiwaratkun et al. (2019). For Flow GMM and Flow GMM-cons, we re-weight the loss on labeled data by λ = 3 (value tuned on validation in Kingma et al. (2014) on CIFAR-10), as otherwise, we observed that the method underfits the labeled data. |