Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Authors: Thomas TCK Zhang, Leonardo Felipe Toso, James Anderson, Nikolai Matni

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the vital importance of DFW on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. and We present numerical experiments to demonstrate the key importance of aspects of our proposed algorithm.
Researcher Affiliation Academia Thomas T.C.K. Zhang University of Pennsylvania Philadelphia, PA ttz2@seas.upenn.edu Leonardo F. Toso Columbia University New York, NY lt2879@columbia.edu James Anderson Columbia University New York, NY james.anderson@columbia.edu Nikolai Matni University of Pennsylvania Philadelphia, PA nmatni@seas.upenn.edu
Pseudocode Yes Algorithm 1 De-biased & Feature-whitened (DFW) Alt. Minimization-Descent
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets No The paper describes generating synthetic data for its experiments ('We generate the T operators', 'A non-isotropic covariance matrix... is generated'), but does not provide access information (link, DOI, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper mentions generating a 'validation set' for one specific experiment and partitioning data within its algorithm (Algorithm 1), but it does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the overall data partitioning for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes For each optimization iteration, we sample N = 100 fresh data per task. We set the operator dimensions and rank as dx = dy = 50 and r = 7. We set the state dimension dx = 25, control dimension du = 2, latent dimension r = 6, horizon N = 100, and input variance σ2 u = 1. In particular, we consider a Re LU-activated network with one hidden layer of dimension 64.