Efficiently Learning Significant Fourier Feature Pairs for Statistical Independence Testing

Authors: Yixin Ren, Yewei Xia, Hao Zhang, Jihong Guan, Shuigeng Zhou

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical evaluation on both synthetic and real datasets validates our method s superiority in effectiveness and efficiency, particularly in handling high-dimensional data and dealing with large-scale scenarios.
Researcher Affiliation Academia 1Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China 2Department of Computer Science and Technology, Tongji University, Shanghai, China 3SIAT, Chinese Academy of Sciences, Shenzhen, China 4Machine Learning Department, MBZUAI, Abu Dhabi, UAE
Pseudocode Yes Algorithm 1 The learning and testing framework
Open Source Code Yes we also provide the experimental data/code in the supplemental material.
Open Datasets Yes We evaluate on four synthetic datasets [18, 30] and two real datasets [44, 30]. ... The first real dataset used is a high-dimensional image dataset 3Dshapes as in [30]. ... we consider the Million Song Data (MSD) as the second real dataset.
Dataset Splits No The paper mentions a 0.5 split ratio for training and testing data, but does not explicitly describe a separate validation split. For example, 'The split ratio is set to 0.5 to facilitate the balance between the two.'
Hardware Specification Yes The experiments are conducted with the same equipment, specifically a 6-core CPU with a 3080 GPU. ... The experiments are all conducted on the same equipment, specifically a 14-core CPU with a 4090 GPU.
Software Dependencies No The paper mentions 'Adam [21] optimizer' but does not specify versions for any software or libraries used in the implementation or experimentation.
Experiment Setup Yes The number of random features D for FHSIC, LFHSIC-G/M, the number of induced variables for Ny HSIC, the block size for BHSIC as well as the number of sub-diagonals R for HSICAgg are all kept consistent as recommended in [44] for fair evaluation. Specifically, we set the number of random mappings in RDC to 20 to ensure compatibility with large-scale datasets. The test location parameter J of NFHSIC is set as default as 10... The maximum number of iterations for the optimization is set to 100 for NFSIC and LFHSIC-G/M. The default learning rate of the optimization of LFHSIC-G/M is set as 0.05 in all the experiments.