Capturing Delayed Feedback in Conversion Rate Prediction via Elapsed-Time Sampling
Authors: Jia-Qi Yang, Xiang Li, Shuguang Han, Tao Zhuang, De-Chuan Zhan, Xiaoyi Zeng, Bin Tong4582-4589
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of ES-DFM, we conduct extensive experiments on a public data and a private industrial dataset. Experimental results confirm that our method consistently outperforms the previous state-of-the-art results. |
| Researcher Affiliation | Collaboration | 1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2 Alibaba Group, Hangzhou, China |
| Pseudocode | No | The paper describes its proposed method using prose and mathematical equations but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code for reproducing our results on public dataset is available at https://github.com/Thyrix Yang/es dfm |
| Open Datasets | Yes | Public Dataset We use the Criteo dataset2 used in Chapelle (2014) to evaluate the proposed method. This dataset is formed by Criteo live traffic data in a period of 60 days, which corresponds to conversions after a click has occurred. Each sample is described by a set of hashed categorical features and a few continuous features. It also includes the timestamps of the clicks and those of the conversions, if any. 2https://labs.criteo.com/2013/12/conversion-logs-dataset/ |
| Dataset Splits | Yes | We divide both public and anonymous dataset into two parts evenly. We use the first part for model pre-training and achieve a well initialized CVR prediction model. We use the second part for streaming data simulation to evaluate compared methods. Following the online training manner of industrial systems, the model is trained on the t-th hour data and tested on the t + 1-th hour data, then trained on the t+1-th hour data and tested on the t+2-th hour data, and so on and so forth. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory). |
| Software Dependencies | No | The paper mentions optimizers and activation functions (e.g., Adam, Leaky ReLU, Batch Norm) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The model architecture is a simple MLP model with the hidden units fixed for all models with [256, 256, 128]. The activation functions are Leaky Re LU and every hidden layer is followed by a Batch Norm layer (Ioffe and Szegedy 2015) to accelerate learning. Adam (Kingma and Ba 2015) is used as the optimizer with the learning rate of 10 3. L2 regularization strength is 10 6. |