Data-faithful Feature Attribution: Mitigating Unobservable Confounders via Instrumental Variables

Authors: Qiheng Sun, Haocheng Xia, Jinfei Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on both synthetic and real-world datasets demonstrate the effectiveness of our approaches.
Researcher Affiliation Academia Qiheng Sun1,2, Haocheng Xia3, Jinfei Liu1,2 1Zhejiang University 2Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 3Siebel School of Computing and Data Science University of Illinois Urbana-Champaign {qiheng_sun,jinfeiliu}@zju.edu.cn, hxia7@illinois.edu
Pseudocode Yes For the algorithm process we propose, which utilizes confidence intervals to optimize the calculation of Shapley values, refer to Algorithm 1.
Open Source Code Yes Our code can be found in the repository at https://github.com/ZJU-DIVER/IV-SHAP.
Open Datasets Yes The first real dataset we used is the Griliches76 dataset [17, 36]... The second real dataset we use is the Angrist and Krueger dataset [3]...
Dataset Splits No The paper describes generating synthetic data and training models, but it does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) to reproduce the partitioning.
Hardware Specification Yes We conduct experiments on a machine with 2 Montage(R) Jintide(R) C6226R @ 2.90GHz and 256GB memory.
Software Dependencies No The paper mentions software components like 'neural network model' and 'XGBoost', but it does not specify any version numbers for these or other libraries/packages.
Experiment Setup No The paper mentions using neural networks and XGBoost models, and discusses loss functions, but it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or detailed optimizer settings for model training.