reproducibilityindex.ai

L0-Sparse Canonical Correlation Analysis

Authors: Ofir Lindenbaum, Moshe Salhov, Amir Averbuch, Yuval Kluger

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of the proposed approach on a wide range of tasks. First, using synthetic data, we demonstrate that ℓ0-CCA correctly identiﬁes the canonical vectors in a challenging regime of N < Dx, Dy. Next, using a coupled video dataset, we demonstrate that ℓ0-CCA can identify the common information from high dimensional data, and embed it into correlated, low-dimensional representations. Then, we use noisy images from MNIST and multi-channel seismic data to demonstrate that ℓ0-DCCA ﬁnds meaningful representations of the data even in a noisy regime. Finally, we use ℓ0-DCCA to improve cancer sub-type classiﬁcation using high dimensional genetic measurements.
Researcher Affiliation	Collaboration	Oﬁr Lindenbaum Faculty of Engineering Bar Ilan University Ramat Gan, Israel ofirlin@gmail.com Moshe Salhov & Amir Averbuch School of Computer Science Tel Aviv University Tel Aviv, Israel Yuval Kluger School of Medicine Yale University New Haven, CT, USA yuval.kluger@yale.edu M.S. is co afﬁliated with Playtika Israel
Pseudocode	Yes	In Algorithm 1 we provide a pseudocode description of the proposed approach.
Open Source Code	No	No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found. Footnotes refer to code for baselines, not the authors' own implementation.
Open Datasets	Yes	We use two noisy variants of MNIST (Le Cun et al., 2010) as our coupled views. Next, we evaluate the method using a dataset of seismic events studied by (Lindenbaum et al., 2018). Here, we use multi-modal observations from the METABRIC data (Curtis et al., 2012) and attempt to ﬁnd correlated representations to improve cancer sub-type classiﬁcation. To generate samples from X, Y RD N, we follow the procedure described in (Suo et al., 2017).
Dataset Splits	Yes	In all experiments validation sets are used for tuning the hyperparameters of all baselines by maximizing the total correlation on the validation set. Each view consists of 62, 000 samples, of which we use 40, 000 for training, 12, 000 for testing and 10, 000 are used as a validation set.
Hardware Specification	Yes	All the experiments were conducted using Intel(R) Xeon(R) CPU E5-2620 v3 @2.4Ghz x2 (12 cores total).
Software Dependencies	No	The paper mentions implementing baselines and refers to GitHub repositories for some comparison methods but does not provide specific version numbers for its own software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	For the linear model we use a learning rate of 0.005 with 10, 000 epochs. The values of λx and λy are both set to 30. We use a learning rate of 0.01 with 2000 epochs. The number of neurons for the ﬁve hidden layer are: 300, 200, 100, 50, and 40 respectively, with a tanh activation after each layer.