MaskFusion: Feature Augmentation for Click-Through Rate Prediction via Input-adaptive Mask Fusion

Authors: Chao Liao, Jianchao Tan, Jiyuan Jia, Yi Guo, Chengru Song

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Mask Fusion achieves state-of-the-art (SOTA) performance on all seven benchmarks deep CTR models with three public datasets. and In this section, comprehensive experiments are conducted on 7 benchmark models to demonstrate the effectiveness and robustness of the Mask Fusion framework over 3 real-world datasets.
Researcher Affiliation Industry Chao Liao, Jianchao Tan, Jiyuan Jia, Yi Guo, Chengru Song Kuaishou Technology {liaochao, jianchaotan, jiajiyuan, guoyi03, songchengru}@kuaishou.com
Pseudocode No The paper describes the components and their operations using textual descriptions and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not include any explicit statements about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate the Mask Fusion framework on three real-world commercial datasets: Criteo, Terabyte, and Avazu. and provides URLs: https://www.kaggle.com/c/criteo-display-ad-challenge, https://labs.criteo.com/2013/12/download-terabyte-click-logs/, http://www.kaggle.com/c/avazu-ctr-prediction
Dataset Splits Yes Criteo... In experiments, the first 6 days data are used as a training set and the rest as the test set. Terabyte... In experiments, the first 23 days of data are used as a training set and the last day of data as the test set. Avazu... The dataset is randomly split by 8:1:1 for training, validating, and testing.
Hardware Specification Yes All experiments are conducted on one 2080Ti GPU.
Software Dependencies No The paper mentions using an 'Adagrad optimizer Duchi et al. (2011)' but does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For numerical features, all of them will be concatenated and transformed into a low dimensional, dense real-value vector by a 4-layers MLP, the number of neurons is [512, 256, 64, 16]. For categorical features, we will embed them into a dense real-value vector with a fixed dimension of 16. For optimization, we utilize an Adagrad optimizer Duchi et al. (2011) with a learning rate of 0.01, and the mini-batch size is 128. The depth of DNN is 3 for all models and the number of neurons is [512, 256, 1]. We simply use the MLP structure as the mask generator, and the structure of each mask generator is [512, 256].