Efficient Counterfactual Learning from Bandit Feedback
Authors: Yusuke Narita, Shota Yasui, Kohei Yata4634-4641
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark. Application. We empirically apply our estimators to evaluate and optimize the design of online advertisement formats. |
| Researcher Affiliation | Collaboration | Yusuke Narita Yale University yusuke.narita@yale.edu Shota Yasui Cyber Agent Inc. yasui shota@cyberagent.co.jp Kohei Yata Yale University kohei.yata@yale.edu |
| Pseudocode | No | The paper describes algorithmic steps and concepts in text but does not include any clearly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | Our application is based on proprietary data provided by Cyber Agent Inc. |
| Dataset Splits | Yes | Specifically, we use data logged during April 20-26, 2018 for estimating the best actions and data during April 27-29 for estimating the expected reward. This data separation allows us to avoid overfitting and overestimation of the CTR gains from the counterfactual policy. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like "Gradient Boosting Machine (implemented by XGBoost)", "Ridge Logistic Regression", "Random Forest", and "Factorization Machine", but it does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | To implement this counterfactual policy, we estimate E[Y (a)|X] by ridge logistic regression for each action a and context X used by the logging policy (we apply one-hot encoding to categorical variables in X). |