Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering

Authors: Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, Junbo Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS and Extensive experiments show that FETCH systematically surpasses the current state-of-the-art Auto FE methods and validates the transferability of Auto FE pre-training.
Researcher Affiliation Academia 1College of Computer Science and Technology, Zhejiang University 2Institute of Computing Innovation, Zhejiang University
Pseudocode Yes Algorithm 1 Training algorithm of FETCH
Open Source Code Yes Source code is available at https://github.com/liyaooi/FETCH, implemented by Mindspore.
Open Datasets Yes The experiments are conducted on 27 datasets including 11 regression (R) datasets and 16 classifications (C) datasets. These datasets are from Open ML2, UCI repository3, and Kaggle4. All datasets are available via the URL links in Appendix B.3.
Dataset Splits Yes They are measured using 5-fold cross-validation. and we employ a standardized cross-validation strategy to enhance the precision of the reward estimation.
Hardware Specification Yes All experiments are carried out on a server with Ubuntu 20.04.1 LTS, Nvidia Ge Force RTX 3090 (24GB GPU memory), Intel(R) Xeon(R) CPU (Gold 5218R CPU @ 2.10GHz, 64 cores), 256GB memory and 1TB hard drive.
Software Dependencies Yes All experimental results are run with open-source code under the environment of Python 3.8.
Experiment Setup Yes RL-agent learning rate lr = 0.001, discount factor γ = 0.95. The hyperparameters of Multi-Head Attention in the policy network are as follows, dmodel = 64, nhead = 6, dv = 32, dk = 32. The maximum number of search epochs N is limited to 300 (including DIFER and FETCH). Due to the requirements of NFS in their paper, we set N to 100 epochs for it. The number of sampling also parallelized workers per round, is W = 24. The maximum feature order K is set by K = 3.