Fast Generic Interaction Detection for Model Interpretability and Compression

Authors: Tianjian Zhang, Feng Yin, Zhi-Quan Luo

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS
Researcher Affiliation Academia 1School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 2Shenzhen Research Institute of Big Data tianjianzhang@link.cuhk.edu.cn {yinfeng,luozq}@cuhk.edu.cn
Pseudocode Yes Algorithm 1 UCB ({Ar : r [n]}, m, k)
Open Source Code Yes The code is available at https://github.com/zhangtj1996/Para ACE.
Open Datasets Yes We generated 10 synthetic datasets (p = 10) as was used in Tsang et al. (2018a)... We also selected 5 real datasets, namely the Elevators for controlling an F16 aircraft (Itorgo, 2019), Parkinsons for predicting the total UPDRS scores (Tsanas et al., 2009), Skillcraft for game player behavior analysis (Thompson et al., 2013), Cal housing for house price prediction (Pace & Barry, 1997), and Bike sharing for predicting hourly bike rental count (Fanaee-T & Gama, 2014). The datasets are preprocessed, and details are shown in supplement D.
Dataset Splits No The paper mentions '800 training samples and 200 test samples' for synthetic data but does not explicitly define a separate validation set. It mentions results averaged over 5 folds, which is a cross-validation strategy, but not a distinct validation split.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'adam' optimizer and 'MXNet' but does not provide specific version numbers for these software components. It also mentions PyTorch in the context of automatic differentiation but no version.
Experiment Setup Yes For the baseline Overpara FC (Teacher), we used the architecture of p-5000-900-400-100-30-1... For the Para ACE model (Student)... the structure of which is 1-50-8-1 for main effects and 2-50-8-1 for pairwise interactions. The fix-up layer is chosen as a single layer Res Net with the number of neurons equal to 15. All the above networks were initialized with Kaiming s strategy (He et al., 2015). We chose adam (Kingma & Ba, 2014) as the optimizer... and the batch size is set to be 500 for all the datasets. In Algorithm 1, we pull each arm 3 times for initialization. Each arm will be pulled at maximally m = 100 times, and we terminate when k = 20 strongest interactions stand out.