Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering
Authors: Liyao Li, Haobo Wang, Liangyu Zha, Qingyi Huang, Sai Wu, Gang Chen, Junbo Zhao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS and Extensive experiments show that FETCH systematically surpasses the current state-of-the-art Auto FE methods and validates the transferability of Auto FE pre-training. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University 2Institute of Computing Innovation, Zhejiang University |
| Pseudocode | Yes | Algorithm 1 Training algorithm of FETCH |
| Open Source Code | Yes | Source code is available at https://github.com/liyaooi/FETCH, implemented by Mindspore. |
| Open Datasets | Yes | The experiments are conducted on 27 datasets including 11 regression (R) datasets and 16 classifications (C) datasets. These datasets are from Open ML2, UCI repository3, and Kaggle4. All datasets are available via the URL links in Appendix B.3. |
| Dataset Splits | Yes | They are measured using 5-fold cross-validation. and we employ a standardized cross-validation strategy to enhance the precision of the reward estimation. |
| Hardware Specification | Yes | All experiments are carried out on a server with Ubuntu 20.04.1 LTS, Nvidia Ge Force RTX 3090 (24GB GPU memory), Intel(R) Xeon(R) CPU (Gold 5218R CPU @ 2.10GHz, 64 cores), 256GB memory and 1TB hard drive. |
| Software Dependencies | Yes | All experimental results are run with open-source code under the environment of Python 3.8. |
| Experiment Setup | Yes | RL-agent learning rate lr = 0.001, discount factor γ = 0.95. The hyperparameters of Multi-Head Attention in the policy network are as follows, dmodel = 64, nhead = 6, dv = 32, dk = 32. The maximum number of search epochs N is limited to 300 (including DIFER and FETCH). Due to the requirements of NFS in their paper, we set N to 100 epochs for it. The number of sampling also parallelized workers per round, is W = 24. The maximum feature order K is set by K = 3. |