Feature Staleness Aware Incremental Learning for CTR Prediction
Authors: Zhikai Wang, Yanyan Shen, Zibin Zhang, Kangyi Lin
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct the experiments to evaluate the performance of our proposed method and answer the following questions : RQ1 Can the proposed Fe SAIL outperform the existing IL methods on CTR prediction? RQ2 How do the SAR and SAS algorithms affect the effectiveness of Fe SAIL ? RQ3 How do different inverse correlation functions and biases influence the performance of Fe SAIL? RQ4 How does the SAS deal with features with different staleness? |
| Researcher Affiliation | Collaboration | Zhikai Wang1 , Yanyan Shen1 , Zibin Zhang2 and Kangyi Lin2 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Tencent |
| Pseudocode | Yes | Algorithm 1: SAS algorithm |
| Open Source Code | Yes | The code can be found in https://github.com/cloudcatcher888/Fe SAIL. |
| Open Datasets | Yes | Datasets. We use three real-world datasets from Criteo1, i Pin You2, Avazu3 and one private industrial dataset collected from a commercial media platform. Criteo This dataset consists of 24 days consecutive traffic logs from Criteo... i Pin You This dataset is a public real-world display ad dataset... Avazu This dataset contains users click behaviours on displayed mobile ads... Media This is a real CTR prediction dataset collected from a commercial media platform. |
| Dataset Splits | Yes | In each time span t, we fine-tune the model parameters with Dt. We then test the BM on samples in Dt+1 which contain features with different staleness. |
| Hardware Specification | Yes | We implemented our Fe SAIL approach using Pytorch 1.10 on a 64-bit Linus server equipped with 32 Intel Xeon@2.10GHz CPUs, 128GB memory and four RTX 2080ti GPUs. |
| Software Dependencies | Yes | We implemented our Fe SAIL approach using Pytorch 1.10 |
| Experiment Setup | Yes | The sampling reservoir size L of Rt is the same size as the corresponding incremental dataset Dt. We use a grid search over the hidden layer size, initial learning rate and the number of cross layers. The batch size is set from 256 to 4096. The embedding size and hidden layer sizes are chosen from 32 to 1024. The η in Eq. (3) is from 5 to 10. We choose the Adam optimizer [Diederik P. Kingma, 2015] to train the model with a learning rate from 0.0001 to 0.001 and perform early stopping in the training process. |