Sequential Counterfactual Risk Minimization
Authors: Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM. We also conduct numerical experiments to demonstrate the effectiveness of our method in both discrete and continuous action settings, and how it improves upon CRM and other existing methods in the literature. In this section we perform numerical experiments to validate our method in practical settings. |
| Researcher Affiliation | Collaboration | Houssam Zenati 1 2 Eustache Diemert 1 Matthieu Martin 1 Julien Mairal 2 Pierre Gaillard 2 1Criteo AI Lab 2Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. Correspondence to: Houssam Zenati <housszenati@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 Sequential Counterfactual Risk Minimization |
| Open Source Code | Yes | All the code to reproduce the empirical results is available at: https://github.com/criteo-research/ sequential-conterfactual-risk-minimization |
| Open Datasets | Yes | Real-world datasets include Scene, Yeast and TMC2007 with feature space up to 30,438 dimensions and action space up to 222. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, such as percentages or sample counts for each split. It mentions using 'test set' loss and training on a 'whole training data' but no specific split ratios are given for validation or training. |
| Hardware Specification | No | The paper mentions that batch bandit algorithms 'did not finish (DNF) in 24h (per single run) on a 46 CPU / 500G RAM machine in most of our settings with large sample size n', but this refers to a machine where some experiments *failed* to run, not a general hardware specification for their own experiments. |
| Software Dependencies | Yes | Eventually, the baselines were carefully optimized using the Jax library (https://github.com/google/jax) to allow for just in time compilations of algebraic blocks in both methods and to maximize their scaling capacity. we use the stable_baselines3(Raffin et al., 2021) library for the implementation. |
| Experiment Setup | Yes | We report in Figure 2 over M = 10 rollouts the mean test loss depending on sample size up to 210, with standard deviation estimated over 10 random runs. Hyper-parameter selection for SCRM In our experiments, hyperparameter selection consists in choosing a value for λ. We also provide the grid of hyperparameters for the λ evaluated in CRM and SCRM methods λ [1e 5, 1e 4, 1e 3, 1e 2, 1e 1]. We combine the 2 last terms with a linear combinaison with hyperparameters being tuned a posteriori, i.e. LOSS = MSE + λ ENTROPY with the hyperparam λ {.5, 1, 2, 5, 10}. |