Efficient Counterfactual Learning from Bandit Feedback

Authors: Yusuke Narita, Shota Yasui, Kohei Yata4634-4641

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark. Application. We empirically apply our estimators to evaluate and optimize the design of online advertisement formats.
Researcher Affiliation Collaboration Yusuke Narita Yale University yusuke.narita@yale.edu Shota Yasui Cyber Agent Inc. yasui shota@cyberagent.co.jp Kohei Yata Yale University kohei.yata@yale.edu
Pseudocode No The paper describes algorithmic steps and concepts in text but does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No Our application is based on proprietary data provided by Cyber Agent Inc.
Dataset Splits Yes Specifically, we use data logged during April 20-26, 2018 for estimating the best actions and data during April 27-29 for estimating the expected reward. This data separation allows us to avoid overfitting and overestimation of the CTR gains from the counterfactual policy.
Hardware Specification No The paper does not explicitly describe any specific hardware components (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions software components like "Gradient Boosting Machine (implemented by XGBoost)", "Ridge Logistic Regression", "Random Forest", and "Factorization Machine", but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes To implement this counterfactual policy, we estimate E[Y (a)|X] by ridge logistic regression for each action a and context X used by the logging policy (we apply one-hot encoding to categorical variables in X).