Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Authors: Yewen Fan, Nian Si, Kun Zhang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive numerical experiments to demonstrate the effectiveness of the proposed meta-algorithm in both synthetic datasets using a logistic regression model and a large-scale realworld dataset using a state-of-the-art recommendation neural network. |
| Researcher Affiliation | Academia | 1 Carnegie Mellon University 2 Mohamed bin Zayed University of Artificial Intelligence 3 University of Chicago Booth School of Business |
| Pseudocode | Yes | Algorithm 1 Variance-adjusting debiasing (VAD) method |
| Open Source Code | Yes | We open-sourced our implementation at https://github.com/tofuwen/VAD. |
| Open Datasets | Yes | We use the Criteo Ad Kaggle dataset 3 to demonstrate our method s performance. The Criteo Ad Kaggle dataset is a common benchmark dataset for CTR predictions. ... 3https://www.kaggle.com/c/criteo-display-ad-challenge |
| Dataset Splits | Yes | we use the first 15 million samples, shuffle the dataset randomly, and split the whole dataset into 85% train Dtrain, 1.5% validation-train Dval train, 1.5% validation-test Dval test, and 12% test Dtest datasets. |
| Hardware Specification | No | The paper mentions 'computational constraints' but does not provide specific details on the hardware used for experiments, such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions software like DLRM, DeepCTR, and x Deep FM, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Throughout our experiments, we use the default parameters and a SGD optimizer. ... In our method, we only need to choose one hyper-parameter S ... All results reported in Section 6 use S = 2. |