Feature Engineering for Predictive Modeling Using Reinforcement Learning

Authors: Udayan Khurana, Horst Samulowitz, Deepak Turaga

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We tested the impact of our FE on a 48 publicly available datasets (different from the datasets used for training) from a variety of domains, and of various sizes. We report the accuracy of (a) base dataset; (b) Our FE routine with RL1, Bmax = 100; (c) Expansion-reduction implementation where all transformations are first applied separately and add to original columns, followed by a feature selection routine; (d) Random: randomly applying a transformation function to a random feature(s) and adding the result to the original dataset and measuring the CV performance; this is repeated 100 times and finally, we consider all the new features whose cases showed an improvement in performance, along with the original features to train a model (e) Tree-Heur: our implementation of Cognito s (Khurana et al. 2016b) global search heuristic for 100 nodes. We used Random Forest with default parameters as our learning algorithm for all the comparisons as it gave us the strongest baseline (no FE) average. A 5-fold cross validation using random stratified sampling was used. The results for a representative set of 24 of those datasets (due to lack of space) are captured in Table 1.
Researcher Affiliation Industry Udayan Khurana ukhurana@us.ibm.com IBM Research AI Horst Samulowitz samulowitz@us.ibm.com IBM Research AI Deepak Turaga turaga@us.ibm.com IBM Research AI
Pseudocode Yes Algorithm 1 outlines the general methodology for exploration. At each step, an estimated reward from each possible move, R(Gi, n, t, i Bmax ) is used to rank the options of actions available at each given state of the transformation graph Gi, i [0, Bmax), where Bmax is the overall allocated budget in number of steps.
Open Source Code No The paper does not provide any explicit statements about open-source code availability or links to a code repository for the described methodology.
Open Datasets Yes We tested the impact of our FE on a 48 publicly available datasets (different from the datasets used for training) from a variety of domains, and of various sizes. We report the accuracy of (a) base dataset; (b) Our FE routine with RL1, Bmax = 100; (c) Expansion-reduction implementation where all transformations are first applied separately and add to original columns, followed by a feature selection routine; (d) Random: randomly applying a transformation function to a random feature(s) and adding the result to the original dataset and measuring the CV performance; this is repeated 100 times and finally, we consider all the new features whose cases showed an improvement in performance, along with the original features to train a model (e) Tree-Heur: our implementation of Cognito s (Khurana et al. 2016b) global search heuristic for 100 nodes. We used Random Forest with default parameters as our learning algorithm for all the comparisons as it gave us the strongest baseline (no FE) average. A 5-fold cross validation using random stratified sampling was used. The results for a representative set of 24 of those datasets (due to lack of space) are captured in Table 1.
Dataset Splits Yes A 5-fold cross validation using random stratified sampling was used.
Hardware Specification Yes For reference to runtime, it took the Bikeshare DC dataset 4 minutes, 40 seconds to run for 100 nodes for our FE, on a single thread on a 2.8GHz processor.
Software Dependencies No The paper does not provide specific version numbers for any software components used in the experiments. It only mentions using "Random Forest with default parameters" but no library name or version.
Experiment Setup Yes We used the discount factor, γ = 0.99, and learning rate parameter, α = 0.05. The weight vectors, wc or w, each of size 12, were initialized with 1 s. The training example steps are drawn randomly with the probability ϵ = 0.15 and the current policy with probability 1 ϵ.