Counterfactual Learning with General Data-Generating Policies
Authors: Yusuke Narita, Kyohei Okumura, Akihiro Shimizu, Kohei Yata
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our method with experiments on partly and entirely deterministic logging policies. Simulation Experiments. We validate our method with two simulation experiments. Real-World Application. We empirically apply our method to evaluate and optimize coupon targeting policies. |
| Researcher Affiliation | Collaboration | Yusuke Narita1, Kyohei Okumura2, Akihiro Shimizu3, Kohei Yata4 1 Yale University, 2 Northwestern University, 3 Mercari, Inc., 4 University of Wisconsin-Madison |
| Pseudocode | No | The paper describes its methods using mathematical formulas and prose, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'The full version of the paper, which includes technical appendices, can be found at https://arxiv.org/abs/2212.01925.' but does not provide any specific links to open-source code for the methodology or state that code is released. |
| Open Datasets | No | The paper states 'Our application is based on proprietary data provided by Mercari Inc.' for its real-world application. For simulations, it describes how data was generated ('We generate a random sample...'), but does not use or provide access to a recognized public dataset. |
| Dataset Splits | No | The paper describes data generation processes for simulations and mentions 'training sample' for internal models, but it does not provide specific training/validation/test dataset splits (exact percentages, sample counts, or citations to predefined splits) for its experiments. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'sklearn’s Random Forest Regressor' and 'pylift for implementation' but does not specify their version numbers or the version of Python used, which is necessary for reproducible software dependencies. |
| Experiment Setup | Yes | For the counterfactual policy π, we use D to train a model f(x, a) that predicts the reward given the context and action, using sklearn s Random Forest Regressor with 500 trees and otherwise default parameters. |