Physics-constrained Automatic Feature Engineering for Predictive Modeling in Materials Science

Authors: Ziyu Xiang, Mingzhou Fan, Guillermo Vázquez Tovar, William Trehern, Byung-Jun Yoon, Xiaofeng Qian, Raymundo Arroyave, Xiaoning Qian10414-10421

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our proposed AFE strategies, we perform experiments with three real-world materials science datasets: one for classification of metal/non-metal materials, one for regression to get alloy elastic behavior based on alloy compositions, and the third dataset for predicting material s phase transition temperature with the physics constraints of feature groups.
Researcher Affiliation Academia 1Electrical & Computer Engineering, 2Materials Science & Engineering, 3Computer Science & Engineering, Texas A&M University, College Station, Texas 77843, 4Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973
Pseudocode Yes Algorithm 1 DQN for Automatic Feature Engineering
Open Source Code Yes Our code is open-source and available at https://github.com/ziyux/AFE.
Open Datasets Yes The classification problem is based on a dataset of 10 prototype structures (Na Cl, Cs Cl, Zn S, Ca F2, Cr3Si, Si C, Ti O2, Zn O, Fe As, Ni As) with a total number of 260 materials from one of the experiments reported in Ouyang et al. (2018)
Dataset Splits No first randomly splitting the dataset with 182 materials in the training set and the remaining 78 materials in the test set (7:3).
Hardware Specification Yes We run all the experiments on the platform with the hardware configuration of Intel Xeon E5-2670, 64GB 1866MHz RAM and 2 NVIDIA k20 GPUs.
Software Dependencies No The paper describes model architecture and hyperparameters but does not provide specific software dependencies (e.g., library names with version numbers) needed for replication beyond the general statement 'Our code is open-source and available at https://github.com/ziyux/AFE.'
Experiment Setup Yes For DQN, we have adopted a two-layer Q-network with the corresponding hidden dimensions {150,120} and the relu activation function is used for both layers. The following hyperparameters are set for DQN training: Learning rate: 0.001; Experience replay batch size: 64; Gamma: 0.99; Epsilon: 1.0 (decay 0.99 and min 0.05).