Multi-View Representation Learning via Total Correlation Objective

Authors: HyeongJoo Hwang, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach in the multi-view translation and classification tasks, outperforming strong baseline methods. 4 Experiments
Researcher Affiliation Academia Hyeong Joo Hwang, Geon-Hyeong Kim, Seunghoon Hong, Kee-Eung Kim KAIST {hjhwang, ghkim}@ai.kaist.ac.kr, {seunghoon.hong, kekim}@kaist.ac.kr
Pseudocode No The paper describes the model's architecture and mathematical derivations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes we will submit the code as the supplementary file.
Open Datasets Yes All the datasets we used are public. We employ Poly MNIST dataset [34]... We also evaluate our method on the multi-view dataset used in [24] where six visual features are extracted from images in Caltech-101 dataset.
Dataset Splits Yes There are 60K tuples of training samples and 10K of test samples. For all datasets, we follow the same preprocessing and training/test splits used in [50].
Hardware Specification No The main paper does not explicitly describe the specific hardware used for experiments. The ethics checklist indicates this information is provided in the supplementary material, but it is not present in the main body of the paper.
Software Dependencies No The paper mentions types of models and classifiers used (e.g., CNN-based classifier, logistic regression) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes For simplicity, we model the encoder, decoder, and approximate marginal distributions using the parameterized Gaussians with the diagonal covariance matrix, i.e. rv ψ (z|ov) = N µv, σ2 v I , qv φ (ov|z) = N (ˆµv, I), and r (z) = N(0, I), respectively. where α is the hyperparmeter that trades off learning minimal sufficient representation in favor of calibrating rv ψ. We also note that our method achieves the best performance with α = 0.8 when trained with incomplete data, while the best is achieved at α = 0.9 when trained with complete data.