Sequential Counterfactual Risk Minimization

Authors: Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM. We also conduct numerical experiments to demonstrate the effectiveness of our method in both discrete and continuous action settings, and how it improves upon CRM and other existing methods in the literature. In this section we perform numerical experiments to validate our method in practical settings.
Researcher Affiliation Collaboration Houssam Zenati 1 2 Eustache Diemert 1 Matthieu Martin 1 Julien Mairal 2 Pierre Gaillard 2 1Criteo AI Lab 2Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. Correspondence to: Houssam Zenati <housszenati@gmail.com>.
Pseudocode Yes Algorithm 1 Sequential Counterfactual Risk Minimization
Open Source Code Yes All the code to reproduce the empirical results is available at: https://github.com/criteo-research/ sequential-conterfactual-risk-minimization
Open Datasets Yes Real-world datasets include Scene, Yeast and TMC2007 with feature space up to 30,438 dimensions and action space up to 222.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, such as percentages or sample counts for each split. It mentions using 'test set' loss and training on a 'whole training data' but no specific split ratios are given for validation or training.
Hardware Specification No The paper mentions that batch bandit algorithms 'did not finish (DNF) in 24h (per single run) on a 46 CPU / 500G RAM machine in most of our settings with large sample size n', but this refers to a machine where some experiments *failed* to run, not a general hardware specification for their own experiments.
Software Dependencies Yes Eventually, the baselines were carefully optimized using the Jax library (https://github.com/google/jax) to allow for just in time compilations of algebraic blocks in both methods and to maximize their scaling capacity. we use the stable_baselines3(Raffin et al., 2021) library for the implementation.
Experiment Setup Yes We report in Figure 2 over M = 10 rollouts the mean test loss depending on sample size up to 210, with standard deviation estimated over 10 random runs. Hyper-parameter selection for SCRM In our experiments, hyperparameter selection consists in choosing a value for λ. We also provide the grid of hyperparameters for the λ evaluated in CRM and SCRM methods λ [1e 5, 1e 4, 1e 3, 1e 2, 1e 1]. We combine the 2 last terms with a linear combinaison with hyperparameters being tuned a posteriori, i.e. LOSS = MSE + λ ENTROPY with the hyperparam λ {.5, 1, 2, 5, 10}.