On Deep Multi-View Representation Learning

Authors: Weiran Wang, Raman Arora, Karen Livescu, Jeff Bilmes

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze several techniques based on prior work, as well as new variants, and compare them experimentally on visual, speech, and language domains. To our knowledge this is the first head-to-head comparison of a variety of such techniques on multiple tasks.
Researcher Affiliation Academia Weiran Wang WEIRANWANG@TTIC.EDU Toyota Technological Institute at Chicago Raman Arora ARORA@CS.JHU.EDU Johns Hopkins University Karen Livescu KLIVESCU@TTIC.EDU Toyota Technological Institute at Chicago Jeff Bilmes BILMES@EE.WASHINGTON.EDU University of Washington, Seattle
Pseudocode No The paper describes the algorithms mathematically and conceptually but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes To facilitate future work, we release our implementations and a new benchmark dataset of simulated two-view data based on MNIST.
Open Datasets Yes In this task, we generate two-view data using the MNIST dataset (Le Cun et al., 1998), which consists of 28x28 grayscale digit images, with 60K/10K images for training/testing. ... We split the XRMB speakers into disjoint sets of 35/8/2/2 speakers for feature learning/recognizer training/tuning/testing. ... We follow the setup of Faruqui & Dyer (2014) and Lu et al. (2015), and use as inputs 640-dimensional monolingual word vectors trained via latent semantic analysis on the WMT 2011 monolingual news corpora and use the same 36K English-German word pairs for multi-view learning.
Dataset Splits Yes The original training set is further split into training/tuning sets of size 50K/10K. ... We split the XRMB speakers into disjoint sets of 35/8/2/2 speakers for feature learning/recognizer training/tuning/testing. ... We evaluate on the bigram similarity dataset of Mitchell & Lapata (2010), using the adjective-noun (AN) and verb-object (VN) subsets, and tuning and test splits (of size 649/1,972) for each subset
Hardware Specification Yes The Tesla K40 GPUs used for this research were donated by NVIDIA Corporation.
Software Dependencies No The paper mentions general tools like 'SGD', 'linear SVMs (Chang & Lin, 2011)', and 'restricted Boltzmann machines (Hinton & Salakhutdinov, 2006)', but does not provide specific version numbers for software components or libraries.
Experiment Setup Yes For DNN-based models, feature mappings (f, g) are implemented by networks of 3 hidden layers, each of 1, 024 sigmoid units, and a linear output layer of L units; reconstruction mappings (p, q) are implemented by networks of 3 hidden layers, each of 1, 024 sigmoid units, and an output layer of 784 sigmoid units. We fix rx = ry = 10^-4 for DCCA and DCCAE. For Split AE/Corr AE/DCCAE/Dist AE we select the trade-off parameter λ via grid search. ... we use SGD for optimization with minibatch size, learning rate and momentum tuned on the tuning set. A small weight decay parameter of 10^-4 is used for all layers.