reproducibilityindex.ai

Learning Feature Engineering for Classification

Authors: Fatemeh Nargesian, Horst Samulowitz, Udayan Khurana, Elias B. Khalil, Deepak Turaga

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that LFE outperforms other feature engineering approaches for an overwhelming majority (89%) of the datasets from various sources while incurring a substantially lower computational cost.
Researcher Affiliation	Collaboration	1University of Toronto, 2IBM Research, 3Georgia Institute of Technology
Pseudocode	No	The paper describes its methods in prose but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states that components were implemented in TensorFlow and Scikit-learn but does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code is open-source or available.
Open Datasets	Yes	We collected 900 classiﬁcation datasets from the Open ML and UCI repositories to train transformation classiﬁers. [Lichman, 2013] M. Lichman. UCI machine learning repository, 2013. [Vanschoren et al., 2014] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning. SIGKDD Explor. Newsl., 15(2):49 60, June 2014.
Dataset Splits	Yes	Training samples were generated for Random Forest and Logistic Regression using 10-fold cross validation and the performance improvement threshold, , of 1%.
Hardware Specification	No	The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions implementing components in ‘Tensor Flow’ and ‘Scikit-learn’, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All transformation classiﬁers are MLPs with one hidden layer. We tuned the number of hidden units to optimize the F-score for each classiﬁer, and they vary from 400 to 500. We use Stochastic Gradient Descent with minibatches to train transformation MLPs. In order to prevent overﬁtting, we apply regularization and drop-out [Srivastava et al., 2014]. We considered scaling range of [-10, 10] and quantile data sketch size of 200 bins.