reproducibilityindex.ai

Less Is Better: Unweighted Data Subsampling via Influence Function

Authors: Zifeng Wang, Hong Zhu, Zhenhua Dong, Xiuqiang He, Shao-Lun Huang6340-6347

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment results demonstrate our methods superiority over existed subsampling methods in diverse tasks, such as text classiﬁcation, image classiﬁcation, click-through prediction, etc.
Researcher Affiliation	Collaboration	Zifeng Wang,1 Hong Zhu,2 Zhenhua Dong,2 Xiuqiang He,2 Shao-Lun Huang1 1Tsinghua-Berkeley Shenzhen Institute, Tsinghua University, 2Noah s Ark Lab, Huawei
Pseudocode	No	The paper includes Fig. 2 titled 'Our unweighted subsampling framework', which is a flowchart diagram, not structured pseudocode or an algorithm block.
Open Source Code	Yes	The code can be found at https://github.com/RyanWangZf/InﬂuenceSubsampling
Open Datasets	Yes	We perform extensive experiments on various public data sets which conclude many domains, including computer vision, natural language processing, click-through rate prediction, etc. Additionally, we test the methods on the Company data set...The data set statistics and more details about preprocessing on some data sets are described in appendix E. (Mentions specific datasets like UCI breast-cancer, diabetes, News20, UCI Adult, cifar10, MNIST, real-sim, SVHN, skin-nonskin, Criteo1%, Covertype, Avazu-app, Avazu-site).
Dataset Splits	Yes	In our experiments, we use a Tr-Va-Te setting which is different from the Tr-Va setting as many previous work do (see the Fig. 4). Both settings proceed in three steps, and share the same ﬁrst two steps: 1) training model ˆθ on the full Tr, predicting on the Va, then computing the IF; 2) getting sampling probability from the IF, doing sampling on Tr to get the subset, then acquiring the subset-model θ.
Hardware Specification	Yes	a Run on the Intel i7-6600U CPU @2.60GHz. b Run on the Intel Xeon CPU E5-2670 v3 @2.30GHz.
Software Dependencies	No	The paper mentions using logistic regression and methods like Preconditioned Conjugate Gradient (PCG) but does not provide specific version numbers for any software, libraries, or programming languages used.
Experiment Setup	No	The paper states 'More details about experimental settings can be found in appendix F.', indicating that specific hyperparameter values or training configurations are not present in the main text provided.