Deep Probabilistic Canonical Correlation Analysis
Authors: Mahdi Karami, Dale Schuurmans8055-8063
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate the representation learning performance of the proposed method and compare against well established baselines in two scenarios: I) when several views are available at training time but only a single view (the primary view) is available at test time, namely the multi-view setting, and II) all views are available at training and testing time, namely the multi-modal setting. |
| Researcher Affiliation | Academia | Mahdi Karami, Dale Schuurmans Department of Computer Science University of Alberta Edmonton, Alberta, Canada {karami1, daes}@ualberta.ca |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | For the experimental study, we used the two-view noisy MNIST datasets of (Wang, Livescu, and Bilmes 2015) and (Wang et al. 2016) where the first view of the dataset was synthesized by randomly rotating each image while the image of the second view was randomly sampled from the same class as the first view, but not necessary the same image, then was corrupted by random uniform noise. A two-modal dataset is built by pairing each image in the MNIST dataset with an arbitrary sample of the same class from USPS dataset (Hull 1994) so that the images of both modalities share only the same digit identity but not the style of the handwriting Multi-modal Facial Components: We also evaluated the proposed method on the multi-modal facial dataset used in (Abavisani and Patel 2018a), based on the Extended Yale B dataset (Lee, Ho, and Kriegman 2005), where 4 facial components (eyes, nose and mouth) and the whole face image formed 5 different modalities. |
| Dataset Splits | Yes | The parameters of the SVM algorithm were tuned using the validation set and the classification error was measured on the test set. To comply with the experiments in (Wang, Livescu, and Bilmes 2015) the degree (number of neighbors) of the nodes was tuned in the set {5, 10, 20, 30, 50} using the validation set, and k-means was used as the last step to construct a final partitioning into 10 clusters in the embedding space. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions techniques like 'stochastic drop-out' and 'linear SVM classification', but it does not specify any software libraries or dependencies with version numbers required for reproduction. |
| Experiment Setup | Yes | Experimental design: To provide a fair comparison, we used neural network architectures with the same capacity as those used in (Wang, Livescu, and Bilmes 2015) and (Wang et al. 2016). Accordingly, for the deep network models, all inference and decoding networks were composed of 3 fully connected nonlinear hidden layers of 1024 units, with Re LU gates used as the nonlinearity for all hidden units. The first and the second encoder specify (µ1, diag(σ2 1)) = f1(x1; θ1), (µ2, diag(σ2 2)) = f2(x2; θ2) with the variances specified by a softplus function, and an extra encoder modeling the canonical correlations diag(pi) using the sigmoid function as the output gate. Independent Bernoulli distributions and independent Gaussian distributions were selected to specify the likelihood functions of the first and the second view, respectively, with the parameters of each view being specified by its own decoder network; sigmoid functions were applied on outputs used to estimate the means of both views while the variances of the Gaussian variables were specified by softplus functions. To prevent over-fitting, stochastic drop-out (Srivastava et al. 2014) was applied to all the layers as a regularization technique. The details of the experimental setup and training procedure can be found in Appendix F. |