reproducibilityindex.ai

Sequential Counterfactual Risk Minimization

Authors: Houssam Zenati, Eustache Diemert, Matthieu Martin, Julien Mairal, Pierre Gaillard

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide an empirical evaluation of our method in both discrete and continuous action settings, and demonstrate the benefits of multiple deployments of CRM. We also conduct numerical experiments to demonstrate the effectiveness of our method in both discrete and continuous action settings, and how it improves upon CRM and other existing methods in the literature. In this section we perform numerical experiments to validate our method in practical settings.
Researcher Affiliation	Collaboration	Houssam Zenati 1 2 Eustache Diemert 1 Matthieu Martin 1 Julien Mairal 2 Pierre Gaillard 2 1Criteo AI Lab 2Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. Correspondence to: Houssam Zenati <housszenati@gmail.com>.
Pseudocode	Yes	Algorithm 1 Sequential Counterfactual Risk Minimization
Open Source Code	Yes	All the code to reproduce the empirical results is available at: https://github.com/criteo-research/ sequential-conterfactual-risk-minimization
Open Datasets	Yes	Real-world datasets include Scene, Yeast and TMC2007 with feature space up to 30,438 dimensions and action space up to 222.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits, such as percentages or sample counts for each split. It mentions using 'test set' loss and training on a 'whole training data' but no specific split ratios are given for validation or training.
Hardware Specification	No	The paper mentions that batch bandit algorithms 'did not finish (DNF) in 24h (per single run) on a 46 CPU / 500G RAM machine in most of our settings with large sample size n', but this refers to a machine where some experiments failed to run, not a general hardware specification for their own experiments.
Software Dependencies	Yes	Eventually, the baselines were carefully optimized using the Jax library (https://github.com/google/jax) to allow for just in time compilations of algebraic blocks in both methods and to maximize their scaling capacity. we use the stable_baselines3(Raffin et al., 2021) library for the implementation.
Experiment Setup	Yes	We report in Figure 2 over M = 10 rollouts the mean test loss depending on sample size up to 210, with standard deviation estimated over 10 random runs. Hyper-parameter selection for SCRM In our experiments, hyperparameter selection consists in choosing a value for λ. We also provide the grid of hyperparameters for the λ evaluated in CRM and SCRM methods λ [1e 5, 1e 4, 1e 3, 1e 2, 1e 1]. We combine the 2 last terms with a linear combinaison with hyperparameters being tuned a posteriori, i.e. LOSS = MSE + λ ENTROPY with the hyperparam λ {.5, 1, 2, 5, 10}.