Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems

Authors: Yewen Fan, Nian Si, Kun Zhang

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive numerical experiments to demonstrate the effectiveness of the proposed meta-algorithm in both synthetic datasets using a logistic regression model and a large-scale realworld dataset using a state-of-the-art recommendation neural network.
Researcher Affiliation Academia 1 Carnegie Mellon University 2 Mohamed bin Zayed University of Artificial Intelligence 3 University of Chicago Booth School of Business
Pseudocode Yes Algorithm 1 Variance-adjusting debiasing (VAD) method
Open Source Code Yes We open-sourced our implementation at https://github.com/tofuwen/VAD.
Open Datasets Yes We use the Criteo Ad Kaggle dataset 3 to demonstrate our method s performance. The Criteo Ad Kaggle dataset is a common benchmark dataset for CTR predictions. ... 3https://www.kaggle.com/c/criteo-display-ad-challenge
Dataset Splits Yes we use the first 15 million samples, shuffle the dataset randomly, and split the whole dataset into 85% train Dtrain, 1.5% validation-train Dval train, 1.5% validation-test Dval test, and 12% test Dtest datasets.
Hardware Specification No The paper mentions 'computational constraints' but does not provide specific details on the hardware used for experiments, such as CPU/GPU models or memory.
Software Dependencies No The paper mentions software like DLRM, DeepCTR, and x Deep FM, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Throughout our experiments, we use the default parameters and a SGD optimizer. ... In our method, we only need to choose one hyper-parameter S ... All results reported in Section 6 use S = 2.