Efficiently Learning Significant Fourier Feature Pairs for Statistical Independence Testing
Authors: Yixin Ren, Yewei Xia, Hao Zhang, Jihong Guan, Shuigeng Zhou
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical evaluation on both synthetic and real datasets validates our method s superiority in effectiveness and efficiency, particularly in handling high-dimensional data and dealing with large-scale scenarios. |
| Researcher Affiliation | Academia | 1Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China 2Department of Computer Science and Technology, Tongji University, Shanghai, China 3SIAT, Chinese Academy of Sciences, Shenzhen, China 4Machine Learning Department, MBZUAI, Abu Dhabi, UAE |
| Pseudocode | Yes | Algorithm 1 The learning and testing framework |
| Open Source Code | Yes | we also provide the experimental data/code in the supplemental material. |
| Open Datasets | Yes | We evaluate on four synthetic datasets [18, 30] and two real datasets [44, 30]. ... The first real dataset used is a high-dimensional image dataset 3Dshapes as in [30]. ... we consider the Million Song Data (MSD) as the second real dataset. |
| Dataset Splits | No | The paper mentions a 0.5 split ratio for training and testing data, but does not explicitly describe a separate validation split. For example, 'The split ratio is set to 0.5 to facilitate the balance between the two.' |
| Hardware Specification | Yes | The experiments are conducted with the same equipment, specifically a 6-core CPU with a 3080 GPU. ... The experiments are all conducted on the same equipment, specifically a 14-core CPU with a 4090 GPU. |
| Software Dependencies | No | The paper mentions 'Adam [21] optimizer' but does not specify versions for any software or libraries used in the implementation or experimentation. |
| Experiment Setup | Yes | The number of random features D for FHSIC, LFHSIC-G/M, the number of induced variables for Ny HSIC, the block size for BHSIC as well as the number of sub-diagonals R for HSICAgg are all kept consistent as recommended in [44] for fair evaluation. Specifically, we set the number of random mappings in RDC to 20 to ensure compatibility with large-scale datasets. The test location parameter J of NFHSIC is set as default as 10... The maximum number of iterations for the optimization is set to 100 for NFSIC and LFHSIC-G/M. The default learning rate of the optimization of LFHSIC-G/M is set as 0.05 in all the experiments. |