reproducibilityindex.ai

Feature Staleness Aware Incremental Learning for CTR Prediction

Authors: Zhikai Wang, Yanyan Shen, Zibin Zhang, Kangyi Lin

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct the experiments to evaluate the performance of our proposed method and answer the following questions : RQ1 Can the proposed Fe SAIL outperform the existing IL methods on CTR prediction? RQ2 How do the SAR and SAS algorithms affect the effectiveness of Fe SAIL ? RQ3 How do different inverse correlation functions and biases inﬂuence the performance of Fe SAIL? RQ4 How does the SAS deal with features with different staleness?
Researcher Affiliation	Collaboration	Zhikai Wang1 , Yanyan Shen1 , Zibin Zhang2 and Kangyi Lin2 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Tencent
Pseudocode	Yes	Algorithm 1: SAS algorithm
Open Source Code	Yes	The code can be found in https://github.com/cloudcatcher888/Fe SAIL.
Open Datasets	Yes	Datasets. We use three real-world datasets from Criteo1, i Pin You2, Avazu3 and one private industrial dataset collected from a commercial media platform. Criteo This dataset consists of 24 days consecutive trafﬁc logs from Criteo... i Pin You This dataset is a public real-world display ad dataset... Avazu This dataset contains users click behaviours on displayed mobile ads... Media This is a real CTR prediction dataset collected from a commercial media platform.
Dataset Splits	Yes	In each time span t, we ﬁne-tune the model parameters with Dt. We then test the BM on samples in Dt+1 which contain features with different staleness.
Hardware Specification	Yes	We implemented our Fe SAIL approach using Pytorch 1.10 on a 64-bit Linus server equipped with 32 Intel Xeon@2.10GHz CPUs, 128GB memory and four RTX 2080ti GPUs.
Software Dependencies	Yes	We implemented our Fe SAIL approach using Pytorch 1.10
Experiment Setup	Yes	The sampling reservoir size L of Rt is the same size as the corresponding incremental dataset Dt. We use a grid search over the hidden layer size, initial learning rate and the number of cross layers. The batch size is set from 256 to 4096. The embedding size and hidden layer sizes are chosen from 32 to 1024. The η in Eq. (3) is from 5 to 10. We choose the Adam optimizer [Diederik P. Kingma, 2015] to train the model with a learning rate from 0.0001 to 0.001 and perform early stopping in the training process.