Neural PCA for Flow-Based Representation Learning

Authors: Shen Li, Bryan Hooi

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically find that state-of-the-art NFs yield poor representations in terms of discriminativeness and informativeness. We attribute it to a mathematical fact that an NF preserves dimensionality throughout the transform such that invertibility is guaranteed. This, apparently, violates the manifold hypothesis that data resides in a lower dimensional manifold embedded in a full Euclidean space [Fefferman et al., 2016]. Consequently, dimensionality preservation leads to redundant representations, which undermines representations desired as shown in our empirical studies. To alleviate the conflict with the manifold hypothesis, we propose Neural-PCA, a flow model that operates in a fulldimensional space while capturing principal components in descending order. Without exploiting any label information, the principal components recovered in an unsupervised way store the most informative elements in leading dimensions, allowing for clear improvements in downstream tasks (e.g. classification and mutual information estimation). Empirically, we find that such improvements are consistent irrespective of the number of trailing dimensions of latent codes dropped, which acts as evidence of the manifold hypothesis. At the same time, dropping leading dimensions results in significant decline in classification accuracy and mutual information estimation, which indicates that leading dimensions capture the most dominant variations of data whereas trailing dimensions store less informative ones. Neural-PCA on one hand preserves exact invertibility, while respecting the manifold hypothesis for interpretability and better representations for downstream tasks on the other. Specifically, a Neural-PCA can be constructed by appending the proposed PCA block to any regular normalizing flow, continuous or discrete. A PCA block is a bijection that allows for an easy inverse and an efficient evaluation of its Jacobian determinant. We further show that, to encourage principal component learning, a non-isotropic base density is a desirable choice along with Neural-PCA. Moreover, in terms of sample generation and inference, we propose an efficient approach for learning orthogonal statistics in SO(n) so that Neural-PCA can be properly evaluated and inversed. Experimental results suggest that Neural-PCA captures principal components in an unsupervised yet generative manner, improving performance in downstream tasks while maintaining the ability for generating visually authentic images and for density estimation.
Researcher Affiliation Academia Shen Li , Bryan Hooi Institute of Data Science, National University of Singapore shen.li@u.nus.edu , bhooi@comp.nus.edu.sg
Pseudocode Yes Algorithm 1 Training algorithm of Neural-PCA
Open Source Code Yes Code is publicly available at https://github.com/Maths Shen/Neural-PCA.
Open Datasets Yes We conduct experiments on a two-dimensional toy dataset, Two-Spiral.
Dataset Splits Yes For each κ, we train a classifier using z κ+ or z κ of the training split of a dataset, choose the best model using the validation split and finally evaluate classification accuracy using the test split.
Hardware Specification No The paper describes the experimental setup and training details but does not specify any hardware components such as GPU models, CPU types, or other computing resources used for running the experiments.
Software Dependencies No The paper mentions using an 'Adam optimizer' and 'SVM with linear kernel' as parts of its methodology. However, it does not provide specific version numbers for any software, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version, scikit-learn version) that would be necessary for exact reproduction.
Experiment Setup Yes The batch size for training is set to 100 for all models. For Neural-PCA, the proposed projection method (cf. Section 3.3) is utilized to aggregate all rotation matrices computed from different batches.