Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data
Authors: Thomas TCK Zhang, Leonardo Felipe Toso, James Anderson, Nikolai Matni
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the vital importance of DFW on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. and We present numerical experiments to demonstrate the key importance of aspects of our proposed algorithm. |
| Researcher Affiliation | Academia | Thomas T.C.K. Zhang University of Pennsylvania Philadelphia, PA ttz2@seas.upenn.edu Leonardo F. Toso Columbia University New York, NY lt2879@columbia.edu James Anderson Columbia University New York, NY james.anderson@columbia.edu Nikolai Matni University of Pennsylvania Philadelphia, PA nmatni@seas.upenn.edu |
| Pseudocode | Yes | Algorithm 1 De-biased & Feature-whitened (DFW) Alt. Minimization-Descent |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository. |
| Open Datasets | No | The paper describes generating synthetic data for its experiments ('We generate the T operators', 'A non-isotropic covariance matrix... is generated'), but does not provide access information (link, DOI, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions generating a 'validation set' for one specific experiment and partitioning data within its algorithm (Algorithm 1), but it does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the overall data partitioning for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For each optimization iteration, we sample N = 100 fresh data per task. We set the operator dimensions and rank as dx = dy = 50 and r = 7. We set the state dimension dx = 25, control dimension du = 2, latent dimension r = 6, horizon N = 100, and input variance σ2 u = 1. In particular, we consider a Re LU-activated network with one hidden layer of dimension 64. |