Feature Engineering for Predictive Modeling Using Reinforcement Learning
Authors: Udayan Khurana, Horst Samulowitz, Deepak Turaga
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested the impact of our FE on a 48 publicly available datasets (different from the datasets used for training) from a variety of domains, and of various sizes. We report the accuracy of (a) base dataset; (b) Our FE routine with RL1, Bmax = 100; (c) Expansion-reduction implementation where all transformations are first applied separately and add to original columns, followed by a feature selection routine; (d) Random: randomly applying a transformation function to a random feature(s) and adding the result to the original dataset and measuring the CV performance; this is repeated 100 times and finally, we consider all the new features whose cases showed an improvement in performance, along with the original features to train a model (e) Tree-Heur: our implementation of Cognito s (Khurana et al. 2016b) global search heuristic for 100 nodes. We used Random Forest with default parameters as our learning algorithm for all the comparisons as it gave us the strongest baseline (no FE) average. A 5-fold cross validation using random stratified sampling was used. The results for a representative set of 24 of those datasets (due to lack of space) are captured in Table 1. |
| Researcher Affiliation | Industry | Udayan Khurana ukhurana@us.ibm.com IBM Research AI Horst Samulowitz samulowitz@us.ibm.com IBM Research AI Deepak Turaga turaga@us.ibm.com IBM Research AI |
| Pseudocode | Yes | Algorithm 1 outlines the general methodology for exploration. At each step, an estimated reward from each possible move, R(Gi, n, t, i Bmax ) is used to rank the options of actions available at each given state of the transformation graph Gi, i [0, Bmax), where Bmax is the overall allocated budget in number of steps. |
| Open Source Code | No | The paper does not provide any explicit statements about open-source code availability or links to a code repository for the described methodology. |
| Open Datasets | Yes | We tested the impact of our FE on a 48 publicly available datasets (different from the datasets used for training) from a variety of domains, and of various sizes. We report the accuracy of (a) base dataset; (b) Our FE routine with RL1, Bmax = 100; (c) Expansion-reduction implementation where all transformations are first applied separately and add to original columns, followed by a feature selection routine; (d) Random: randomly applying a transformation function to a random feature(s) and adding the result to the original dataset and measuring the CV performance; this is repeated 100 times and finally, we consider all the new features whose cases showed an improvement in performance, along with the original features to train a model (e) Tree-Heur: our implementation of Cognito s (Khurana et al. 2016b) global search heuristic for 100 nodes. We used Random Forest with default parameters as our learning algorithm for all the comparisons as it gave us the strongest baseline (no FE) average. A 5-fold cross validation using random stratified sampling was used. The results for a representative set of 24 of those datasets (due to lack of space) are captured in Table 1. |
| Dataset Splits | Yes | A 5-fold cross validation using random stratified sampling was used. |
| Hardware Specification | Yes | For reference to runtime, it took the Bikeshare DC dataset 4 minutes, 40 seconds to run for 100 nodes for our FE, on a single thread on a 2.8GHz processor. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components used in the experiments. It only mentions using "Random Forest with default parameters" but no library name or version. |
| Experiment Setup | Yes | We used the discount factor, γ = 0.99, and learning rate parameter, α = 0.05. The weight vectors, wc or w, each of size 12, were initialized with 1 s. The training example steps are drawn randomly with the probability ϵ = 0.15 and the current policy with probability 1 ϵ. |