MaskFusion: Feature Augmentation for Click-Through Rate Prediction via Input-adaptive Mask Fusion
Authors: Chao Liao, Jianchao Tan, Jiyuan Jia, Yi Guo, Chengru Song
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Mask Fusion achieves state-of-the-art (SOTA) performance on all seven benchmarks deep CTR models with three public datasets. and In this section, comprehensive experiments are conducted on 7 benchmark models to demonstrate the effectiveness and robustness of the Mask Fusion framework over 3 real-world datasets. |
| Researcher Affiliation | Industry | Chao Liao, Jianchao Tan, Jiyuan Jia, Yi Guo, Chengru Song Kuaishou Technology {liaochao, jianchaotan, jiajiyuan, guoyi03, songchengru}@kuaishou.com |
| Pseudocode | No | The paper describes the components and their operations using textual descriptions and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not include any explicit statements about releasing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate the Mask Fusion framework on three real-world commercial datasets: Criteo, Terabyte, and Avazu. and provides URLs: https://www.kaggle.com/c/criteo-display-ad-challenge, https://labs.criteo.com/2013/12/download-terabyte-click-logs/, http://www.kaggle.com/c/avazu-ctr-prediction |
| Dataset Splits | Yes | Criteo... In experiments, the first 6 days data are used as a training set and the rest as the test set. Terabyte... In experiments, the first 23 days of data are used as a training set and the last day of data as the test set. Avazu... The dataset is randomly split by 8:1:1 for training, validating, and testing. |
| Hardware Specification | Yes | All experiments are conducted on one 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using an 'Adagrad optimizer Duchi et al. (2011)' but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For numerical features, all of them will be concatenated and transformed into a low dimensional, dense real-value vector by a 4-layers MLP, the number of neurons is [512, 256, 64, 16]. For categorical features, we will embed them into a dense real-value vector with a fixed dimension of 16. For optimization, we utilize an Adagrad optimizer Duchi et al. (2011) with a learning rate of 0.01, and the mini-batch size is 128. The depth of DNN is 3 for all models and the number of neurons is [512, 256, 1]. We simply use the MLP structure as the mask generator, and the structure of each mask generator is [512, 256]. |