Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network

Authors: Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, xiuqiang He, Xue (Steve) Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results from experiments on three large real-world benchmark datasets demonstrate that Opt Feature performs well in terms of accuracy and efficiency. We conduct experiments over three large-scale real-world benchmarks and compare accuracy and efficiency with state-of-the-art models.
Researcher Affiliation Collaboration Fuyuan Lyu1, Xing Tang2 , Dugang Liu4, Chen Ma3, Weihong Luo2, Liang Chen2, Xiuqiang He2, Xue Liu1 1Mc Gill University, 2Fi T, Tencent, 3City University of Hong Kong, 4Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) corresponding author fuyuan.lyu@mail.mcgill.ca, xing.tang@hotmail.com, dugang.ldg@gmail.com, chenma@cityu.edu.hk, {lobbyluo,leocchen,xiuqianghe}@tencent.com, xue.liu@cs.mcgill.ca
Pseudocode Yes Algorithm 1 Sparsification-based Selection Algorithm
Open Source Code Yes All source code are publicly available1. 1https://github.com/fuyuanlyu/Opt Feature. These baselines and our Opt Feature are available here8. 8https://github.com/fuyuanlyu/Opt Feature
Open Datasets Yes To validate the effectiveness of our proposed method Opt Feature, we conduct experiments on three benchmark datasets (Criteo, Avazu3, and KDD124), which are widely used in previous work on deep sparse networks for evaluation purposes [7, 11]. 3http://www.kaggle.com/c/avazu-ctr-prediction 4http://www.kddcup2012.org/c/kddcup2012-track2/data. Table 3: Dataset Statistics Dataset #samples #field #value pos ratio Criteo 4.6 107 39 6.8 106 0.2562 Avazu 4.0 107 24 4.4 106 0.1698 KDD12 1.5 108 11 6.0 106 0.0445
Dataset Splits No The paper mentions tuning hyperparameters on the validation set (Section 4.1) and using training and validation sets (Algorithm 1), and implicitly uses a test set for performance evaluation (Table 1). However, it does not provide specific percentages, sample counts, or the detailed methodology used to create these train/validation/test splits for the datasets.
Hardware Specification Yes All experiments are conducted on a Linux server with one Nvidia-Tesla V100-PCIe-32GB GPU, 128GB main memory and 8 Intel(R) Xeon(R) Gold 6140 CPU cores.
Software Dependencies No The paper states, "We implement our methods using Py Torch," and mentions using the "Adam optimizer" and the "torchfm" library, but it does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Parameter Setup To ensure the reproducibility of experimental results, here we further introduce the implementation setting in detail. We implement our methods using Py Torch. We adopt the Adam optimizer with a mini-batch size of 4096. We set the embedding sizes to 16 in all the models. We set the predictor as an MLP model with [1024, 512, 256] for all methods. All the hyper-parameters are tuned on the validation set with a learning rate from [1e-3, 3e-4, 1e-4, 3e-5, 1e-5] and weight decay from [1e-4, 3e-5, 1e-5, 3e-6, 1e-6]. We also tune the learning ratio for the feature interaction selection parameters from [1e-4, 3e-5, 1e-5, 3e-6, 1e-6] and while weight decay from [1e-4, 3e-5, 1e-5, 3e-6, 1e-6, 0].