Generalized Leverage Scores: Geometric Interpretation and Applications

Authors: Bruno Ordozgoiti, Antonis Matakos, Aristides Gionis

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run numerical experiments to provide further insight on the proposed methods. We conduct experiments to gain further insight about our results.
Researcher Affiliation Academia 1School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom 2Deparment of Computer Science, Aalto University, Finland 3Division of Theoretical Computer Science, KTH Royal Institute of Technology, Sweden.
Pseudocode Yes Algorithm 1 Generalized deterministic leverage score sampling for GCSS (Problem 2). Algorithm 2 Generalized deterministic leverage score sampling for Sparse CCA.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes We evaluate the performance of Algorithm 1 on a collection of real datasets, obtained from a repository maintained by Arizona State University for feature-selection tasks.1 (footnote 1: https://jundongl.github.io/ scikit-feature/datasets.html)
Dataset Splits No The paper states: 'We split each dataset in two. The first half of the columns acts as matrix A (see def. of GCSS), and the second as B.' This describes how the data is used for the specific problem setup, but does not provide standard training/validation/test dataset splits for model reproduction.
Hardware Specification No The paper discusses computational complexity and running time but does not provide specific details about the hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions utilizing 'off-the-shelf software' for computing SVD but does not provide specific version numbers for any software dependencies or libraries used in their implementation.
Experiment Setup Yes To avoid large values of the ratio σω / σµ , we only retain singular vectors paired with the leading singular values accounting for at least 75% of the total squared norm. We thus try setting the size of R to a number of values: 1/10, 1/4 and 1/2 of the retained rank. For each resulting R we compute generalized leverage scores. We use the generalized-leverage-score ordering to build a column submatrix Ck of size k, for a range of k (depending on the dataset size).