reproducibilityindex.ai

Efficient Counterfactual Learning from Bandit Feedback

Authors: Yusuke Narita, Shota Yasui, Kohei Yata4634-4641

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark. Application. We empirically apply our estimators to evaluate and optimize the design of online advertisement formats.
Researcher Affiliation	Collaboration	Yusuke Narita Yale University yusuke.narita@yale.edu Shota Yasui Cyber Agent Inc. yasui shota@cyberagent.co.jp Kohei Yata Yale University kohei.yata@yale.edu
Pseudocode	No	The paper describes algorithmic steps and concepts in text but does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	Our application is based on proprietary data provided by Cyber Agent Inc.
Dataset Splits	Yes	Speciﬁcally, we use data logged during April 20-26, 2018 for estimating the best actions and data during April 27-29 for estimating the expected reward. This data separation allows us to avoid overﬁtting and overestimation of the CTR gains from the counterfactual policy.
Hardware Specification	No	The paper does not explicitly describe any specific hardware components (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions software components like "Gradient Boosting Machine (implemented by XGBoost)", "Ridge Logistic Regression", "Random Forest", and "Factorization Machine", but it does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	To implement this counterfactual policy, we estimate E[Y (a)\|X] by ridge logistic regression for each action a and context X used by the logging policy (we apply one-hot encoding to categorical variables in X).