Less Is Better: Unweighted Data Subsampling via Influence Function
Authors: Zifeng Wang, Hong Zhu, Zhenhua Dong, Xiuqiang He, Shao-Lun Huang6340-6347
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiment results demonstrate our methods superiority over existed subsampling methods in diverse tasks, such as text classification, image classification, click-through prediction, etc. |
| Researcher Affiliation | Collaboration | Zifeng Wang,1 Hong Zhu,2 Zhenhua Dong,2 Xiuqiang He,2 Shao-Lun Huang1 1Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, 2Noah s Ark Lab, Huawei |
| Pseudocode | No | The paper includes Fig. 2 titled 'Our unweighted subsampling framework', which is a flowchart diagram, not structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The code can be found at https://github.com/RyanWangZf/InfluenceSubsampling |
| Open Datasets | Yes | We perform extensive experiments on various public data sets which conclude many domains, including computer vision, natural language processing, click-through rate prediction, etc. Additionally, we test the methods on the Company data set...The data set statistics and more details about preprocessing on some data sets are described in appendix E. (Mentions specific datasets like UCI breast-cancer, diabetes, News20, UCI Adult, cifar10, MNIST, real-sim, SVHN, skin-nonskin, Criteo1%, Covertype, Avazu-app, Avazu-site). |
| Dataset Splits | Yes | In our experiments, we use a Tr-Va-Te setting which is different from the Tr-Va setting as many previous work do (see the Fig. 4). Both settings proceed in three steps, and share the same first two steps: 1) training model ˆθ on the full Tr, predicting on the Va, then computing the IF; 2) getting sampling probability from the IF, doing sampling on Tr to get the subset, then acquiring the subset-model θ. |
| Hardware Specification | Yes | a Run on the Intel i7-6600U CPU @2.60GHz. b Run on the Intel Xeon CPU E5-2670 v3 @2.30GHz. |
| Software Dependencies | No | The paper mentions using logistic regression and methods like Preconditioned Conjugate Gradient (PCG) but does not provide specific version numbers for any software, libraries, or programming languages used. |
| Experiment Setup | No | The paper states 'More details about experimental settings can be found in appendix F.', indicating that specific hyperparameter values or training configurations are not present in the main text provided. |