Counterfactual Learning with General Data-Generating Policies

Authors: Yusuke Narita, Kyohei Okumura, Akihiro Shimizu, Kohei Yata

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our method with experiments on partly and entirely deterministic logging policies. Simulation Experiments. We validate our method with two simulation experiments. Real-World Application. We empirically apply our method to evaluate and optimize coupon targeting policies.
Researcher Affiliation Collaboration Yusuke Narita1, Kyohei Okumura2, Akihiro Shimizu3, Kohei Yata4 1 Yale University, 2 Northwestern University, 3 Mercari, Inc., 4 University of Wisconsin-Madison
Pseudocode No The paper describes its methods using mathematical formulas and prose, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'The full version of the paper, which includes technical appendices, can be found at https://arxiv.org/abs/2212.01925.' but does not provide any specific links to open-source code for the methodology or state that code is released.
Open Datasets No The paper states 'Our application is based on proprietary data provided by Mercari Inc.' for its real-world application. For simulations, it describes how data was generated ('We generate a random sample...'), but does not use or provide access to a recognized public dataset.
Dataset Splits No The paper describes data generation processes for simulations and mentions 'training sample' for internal models, but it does not provide specific training/validation/test dataset splits (exact percentages, sample counts, or citations to predefined splits) for its experiments.
Hardware Specification No The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like 'sklearn’s Random Forest Regressor' and 'pylift for implementation' but does not specify their version numbers or the version of Python used, which is necessary for reproducible software dependencies.
Experiment Setup Yes For the counterfactual policy π, we use D to train a model f(x, a) that predicts the reward given the context and action, using sklearn s Random Forest Regressor with 500 trees and otherwise default parameters.