Stochastic Optimization for Multiview Representation Learning using Partial Least Squares

Authors: Raman Arora, Poorya Mianjy, Teodor Marinov

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of our methods against other stochastic baselines discussed in Section 2, in terms of the progress made on the objective as a function of the number of iterations as well as the CPU runtime, on both synthetic and real-world datasets. [...] Figure 1 shows the PLS objective as a function of the number of iterations (samples processed) as well as CPU runtime, for target dimensionality k {2, 4, 8}. [...] Figure 2 shows the PLS objective, as a function of the number of samples processed (iterations) as well as CPU runtime, for ranks k {2, 4, 8}.
Researcher Affiliation Academia Raman Arora ARORA@CS.JHU.EDU Poorya Mianjy MIANJY@JHU.EDU Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218. Teodor V. Marinov T.V.MARINOV@SMS.ED.AC.UK School of Informatics, University of Edinburgh, Edinburgh UK, EH8 9AB
Pseudocode Yes Algorithm 1 Matrix Stochastic Gradient [...] Algorithm 2 Matrix Exponentiated Gradient
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes In this section, we discuss experiments on the University of Wisconsin X-ray Microbeam (XRMB) Database (Westbury, 1994).
Dataset Splits Yes Each view is split into training, tuning and testing sets, each of size n. [...] Because we cannot evaluate the true population objective for Problem 1, we instead approximate them by evaluating on a held-out testing sample (half of the dataset, with the other half being used for training). All results are averaged over 50 random train/test splits.
Hardware Specification No No specific hardware details (like CPU/GPU models, processor types, or memory amounts) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names with versions, or solver versions) needed to replicate the experiments.
Experiment Setup Yes We tune the initial learning rate parameter η0 for each algorithm over the set {0.001, 0.01, 0.1, 1, 10}. All algorithms were run for only one pass over the training data. [...] we deliberately set all initial learning rates η0 = 1, choosing ηt = 1/t uniformly for all experiments.