Fast Generic Interaction Detection for Model Interpretability and Compression
Authors: Tianjian Zhang, Feng Yin, Zhi-Quan Luo
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS |
| Researcher Affiliation | Academia | 1School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 2Shenzhen Research Institute of Big Data tianjianzhang@link.cuhk.edu.cn {yinfeng,luozq}@cuhk.edu.cn |
| Pseudocode | Yes | Algorithm 1 UCB ({Ar : r [n]}, m, k) |
| Open Source Code | Yes | The code is available at https://github.com/zhangtj1996/Para ACE. |
| Open Datasets | Yes | We generated 10 synthetic datasets (p = 10) as was used in Tsang et al. (2018a)... We also selected 5 real datasets, namely the Elevators for controlling an F16 aircraft (Itorgo, 2019), Parkinsons for predicting the total UPDRS scores (Tsanas et al., 2009), Skillcraft for game player behavior analysis (Thompson et al., 2013), Cal housing for house price prediction (Pace & Barry, 1997), and Bike sharing for predicting hourly bike rental count (Fanaee-T & Gama, 2014). The datasets are preprocessed, and details are shown in supplement D. |
| Dataset Splits | No | The paper mentions '800 training samples and 200 test samples' for synthetic data but does not explicitly define a separate validation set. It mentions results averaged over 5 folds, which is a cross-validation strategy, but not a distinct validation split. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'adam' optimizer and 'MXNet' but does not provide specific version numbers for these software components. It also mentions PyTorch in the context of automatic differentiation but no version. |
| Experiment Setup | Yes | For the baseline Overpara FC (Teacher), we used the architecture of p-5000-900-400-100-30-1... For the Para ACE model (Student)... the structure of which is 1-50-8-1 for main effects and 2-50-8-1 for pairwise interactions. The fix-up layer is chosen as a single layer Res Net with the number of neurons equal to 15. All the above networks were initialized with Kaiming s strategy (He et al., 2015). We chose adam (Kingma & Ba, 2014) as the optimizer... and the batch size is set to be 500 for all the datasets. In Algorithm 1, we pull each arm 3 times for initialization. Each arm will be pulled at maximally m = 100 times, and we terminate when k = 20 strongest interactions stand out. |