Nonparametric Evaluation of Noisy ICA Solutions
Authors: Syamantak Kumar, Derek Bean, peter j bickel, Purnamrita Sarkar
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we develop a nonparametric score to adaptively pick the right algorithm for ICA with arbitrary Gaussian noise. The novelty of this score stems from the fact that it just assumes a finite second moment of the data and uses the characteristic function to evaluate the quality of the estimated mixing matrix without any knowledge of the parameters of the noise distribution. In addition, we propose some new contrast functions and algorithms that enjoy the same fast computability as existing algorithms like FASTICA and JADE but work in domains where the former may fail. While these also may have weaknesses, our proposed diagnostic, as shown by our simulations, can remedy them. Finally, we propose a theoretical framework to analyze the local and global convergence properties of our algorithms. |
| Researcher Affiliation | Academia | Syamantak Kumar1 Purnamrita Sarkar2 Peter Bickel3 Derek Bean4 1Department of Computer Science, UT Austin 2Department of Statistics and Data Sciences, UT Austin 3Department of Statistics, University of California, Berkeley 4 Department of Statistics, University of Wisconsin, Madison syamantak@utexas.edu, purna.sarkar@austin.utexas.edu, bickel@stat.berkeley.edu, derekb@stat.wisc.edu |
| Pseudocode | Yes | Algorithm 1 Meta-algorithm for choosing best candidate algorithm. Input: Algorithm list L, Data X Rn k for j in range[1, size (L)] do Bj Lj (X) {Extract mixing matrix Bj using jth candidate algorithm} δj b Et N(0,Ik) h t, B 1 j | ˆP i end for i arg minj [size(L)] [δj] return Bi |
| Open Source Code | Yes | MATLAB implementations (under the GNU General Public License) can be found at Fast ICA and JADE. The code for PFICA was provided on request by the authors. |
| Open Datasets | No | The paper describes generating synthetic data and using MNIST images, but it does not provide concrete access information (link, DOI, specific citation with authors/year) for the MNIST dataset to confirm public availability. |
| Dataset Splits | No | The paper refers to 'random runs' and '100 random runs' but does not specify training, validation, or test splits. It mentions 'sample size n = 105' but not how this data is partitioned for model development and evaluation. |
| Hardware Specification | Yes | Our experiments were performed on a Macbook Pro M2 2022 CPU with 8 GB RAM. |
| Software Dependencies | No | The paper mentions 'MATLAB implementations' for Fast ICA and JADE but does not specify their version numbers or any other software dependencies with versions. |
| Experiment Setup | Yes | Experimental setup: Similar to [49], the mixing matrix B is constructed as B = UΛV T , where U, V are k dimensional random orthonormal matrices, and Λ is a diagonal matrix with Λii [1, 3]. The covariance matrix Σ of the noise g follows the Wishart distribution and is chosen to be ρ k RRT , where k is the number of sources, and R is a random Gaussian matrix. Higher values of the noise power ρ can make the noisy ICA problem harder. Keeping B fixed, we report the median of 100 random runs of data generated from a given distribution (different for different experiments). ... The quasi-orthogonalization matrix for CHF and CGF is initialized as ˆB ˆBT using the mixing matrix, ˆB, estimated via PFICA. The performance of CHF and CGF is based on a single random initialization of the vector in the power method (see Algorithm 2). |