On Deep Multi-View Representation Learning
Authors: Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze several techniques based on prior work, as well as new variants, and compare them experimentally on visual, speech, and language domains. To our knowledge this is the first head-to-head comparison of a variety of such techniques on multiple tasks. |
| Researcher Affiliation | Academia | Weiran Wang WEIRANWANG@TTIC.EDU Toyota Technological Institute at Chicago Raman Arora ARORA@CS.JHU.EDU Johns Hopkins University Karen Livescu KLIVESCU@TTIC.EDU Toyota Technological Institute at Chicago Jeff Bilmes BILMES@EE.WASHINGTON.EDU University of Washington, Seattle |
| Pseudocode | No | The paper describes the algorithms mathematically and conceptually but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To facilitate future work, we release our implementations and a new benchmark dataset of simulated two-view data based on MNIST. |
| Open Datasets | Yes | In this task, we generate two-view data using the MNIST dataset (Le Cun et al., 1998), which consists of 28x28 grayscale digit images, with 60K/10K images for training/testing. ... We split the XRMB speakers into disjoint sets of 35/8/2/2 speakers for feature learning/recognizer training/tuning/testing. ... We follow the setup of Faruqui & Dyer (2014) and Lu et al. (2015), and use as inputs 640-dimensional monolingual word vectors trained via latent semantic analysis on the WMT 2011 monolingual news corpora and use the same 36K English-German word pairs for multi-view learning. |
| Dataset Splits | Yes | The original training set is further split into training/tuning sets of size 50K/10K. ... We split the XRMB speakers into disjoint sets of 35/8/2/2 speakers for feature learning/recognizer training/tuning/testing. ... We evaluate on the bigram similarity dataset of Mitchell & Lapata (2010), using the adjective-noun (AN) and verb-object (VN) subsets, and tuning and test splits (of size 649/1,972) for each subset |
| Hardware Specification | Yes | The Tesla K40 GPUs used for this research were donated by NVIDIA Corporation. |
| Software Dependencies | No | The paper mentions general tools like 'SGD', 'linear SVMs (Chang & Lin, 2011)', and 'restricted Boltzmann machines (Hinton & Salakhutdinov, 2006)', but does not provide specific version numbers for software components or libraries. |
| Experiment Setup | Yes | For DNN-based models, feature mappings (f, g) are implemented by networks of 3 hidden layers, each of 1, 024 sigmoid units, and a linear output layer of L units; reconstruction mappings (p, q) are implemented by networks of 3 hidden layers, each of 1, 024 sigmoid units, and an output layer of 784 sigmoid units. We fix rx = ry = 10^-4 for DCCA and DCCAE. For Split AE/Corr AE/DCCAE/Dist AE we select the trade-off parameter λ via grid search. ... we use SGD for optimization with minibatch size, learning rate and momentum tuned on the tuning set. A small weight decay parameter of 10^-4 is used for all layers. |