Nonparametric Canonical Correlation Analysis
Authors: Tomer Michaeli, Weiran Wang, Karen Livescu
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following experiments, we compare PLCCA/NCCA with linear CCA, two kernel CCA approximations using random Fourier features (FKCCA, (Lopez-Paz et al., 2014)) and Nystr om approximation (NKCCA, (Williams & Seeger, 2001)) as described in (Wang et al., 2015b), and deep CCA (DCCA, (Andrew et al., 2013)). Illustrative example We begin with the 2D synthetic dataset (1000 training samples) in Fig. 2(a,b), where samples of the two input manifolds are colored according to their common degree of freedom. Clearly, a linear mapping in view 1 cannot unfold the manifold to align the two views, and linear CCA indeed fails (results not shown). We extract a one-dimensional projection for each view using different nonlinear CCAs, and plot the projection g(y) vs. f(x) of test data (a different set of 1000 random sam-ples from the same distribution) in Fig. 2(c-f). Since the second view is essentially a linear manifold (plus noise), for NKCCA we use a linear kernel in view 2 and a Gaussian kernel in view 1, and for DCCA we use a linear network for view 2 and two hidden layers of 512 Re LU units for view 1. Overall, NCCA achieves better alignment of the views while compressing the noise (variations not described by the common degree of freedom). While DCCA also succeeds in unfolding the view 1 manifold, it fails to compress the noise. |
| Researcher Affiliation | Academia | Tomer Michaeli TOMER.M@EE.TECHNION.AC.IL Technion Israel Institute of Technology, Haifa, Israel Weiran Wang WEIRANWANG@TTIC.EDU TTI-Chicago, Chicago, IL 60637, USA Karen Livescu KLIVESCU@TTIC.EDU TTI-Chicago, Chicago, IL 60637, USA |
| Pseudocode | Yes | Algorithm 1 Nonparametric CCA with Gaussian KDE |
| Open Source Code | No | The paper does not provide any links to open-source code or state that the code is publicly available. |
| Open Datasets | Yes | The University of Wisconsin X-Ray Micro-Beam (XRMB) corpus (Westbury, 1994) ... Noisy MNIST handwritten digits dataset, generated identically to that of Wang et al. (2015b) but with a larger training set. View 1 inputs are randomly rotated images (28 28, gray scale) from the original MNIST dataset (Le Cun et al., 1998) |
| Dataset Splits | Yes | we randomly shuffle the frames and generate splits of 30K/10K/11K frames for training/tuning/testing, and we refer to the result as the JW11-s setup ... We generate 450K/10K/10K pairs of images for training/tuning/testing |
| Hardware Specification | Yes | Overall, NCCA achieves the best canonical correlation while being much faster than the other nonlinear methods. ... run time (in seconds) of the algorithms (measured with a single thread on a workstation with a 3.2GHz CPU and 56G main memory) |
| Software Dependencies | No | The paper mentions specific tools like LIBSVM and refers to ReLU units, but does not specify version numbers for these or other software dependencies necessary for replication. |
| Experiment Setup | Yes | We extract 112D projections with each algorithm and measure the total correlation between the two views of the test set, after an additional 112D linear CCA. As in prior work, for both FKCCA and NKCCA we use rank6000 approximations for the kernel matrices; for DCCA we use two Re LU (Nair & Hinton, 2010) hidden layers of width 1800/1200 for view 1/2 respectively and run stochastic optimization with minibatch size 750 as in (Wang et al., 2015a) for 100 epochs. Kernel widths for FKCCA/NKCCA, learning rate and momentum for DCCA, kernel widths and neighborhood sizes for NCCA/PLCCA are selected by grid search based on total tuning set correlation. Sensitivity to their values is mild over a large range; e.g., setting the kernel widths to 30-60% of the sample L2 norm gives similarly good results. For NCCA/PLCCA, input dimensionalities are first reduced by PCA to 20% of the original ones (except that PLCCA does not apply PCA for view 2 in order to extract a 112D projection). |