Randomized Confidence Bounds for Stochastic Partial Monitoring

Authors: Maxime Heuillet, Ola Ahmad, Audrey Durand

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed Rand CBP and Rand CBPside strategies have competitive performance against stateof-the-art baselines in multiple PM games. 5. Numerical Experiments We conduct experiments to validate the empirical performance of Rand CBP and Rand CBPside on the well-known Apple Tasting (AT) (Helmbold et al., 2000) (further studied in (Raman et al., 2024)) and Label Efficient (LE) (Helmbold et al., 1997) games.
Researcher Affiliation Collaboration 1Universit e Laval, Canada 2Thales Research and Technology (cort AIx), Canada 3Canada-CIFAR AI Chair, Mila, Canada.
Pseudocode Yes Algorithm 1 CBP (Bart ok et al., 2012b) and Rand CPB; Algorithm 2 Randomization Procedure; Algorithm 3 CBPside (Lienert, 2013) and Rand CPBside
Open Source Code Yes Our paper is the first to provide extensive reproducibility resources (open-source code for all strategies and environments, and game analyses in the Appendix) to facilitate future applied developments. Code is available at https://github.com/MaxHeuillet/partial-monitoring-algos.
Open Datasets Yes We conduct experiments to validate the empirical performance of Rand CBP and Rand CBPside on the well-known Apple Tasting (AT) (Helmbold et al., 2000) (further studied in (Raman et al., 2024)) and Label Efficient (LE) (Helmbold et al., 1997) games.
Dataset Splits No The paper does not explicitly provide training, validation, or test dataset splits. It describes generating contexts uniformly and running experiments over a T=20k horizon, but no specific dataset partitioning for train/validation is mentioned.
Hardware Specification Yes Contextual and non-contextual experiments are run on machines with 48 CPUs which justifies why we consider 96 runs rather than 100 (48 2 = 96 is the optimal allocation).
Software Dependencies No The paper mentions using "Gurobi (Gurobi Optimization, LLC, 2023) or PULP (Mitchell et al., 2011)" but does not provide explicit version numbers for these or other key software dependencies like programming languages or libraries.
Experiment Setup Yes The number of samples for BPM-Least, TSPM and TSPM-Gaussian is set to 100. The strategies TSPM and TSPM-Gaussian are set with λ = 0.01... To compare CBP and Rand CBP fairly, both strategies are set with α = 1.01. Sampling in Rand CBP is performed according to the procedure described in Section 3.2 over K = 5 bins, with probability ε = 10 7 on the tail and standard deviation σ = 1... All contextual approaches use a regularization λ = 0.05.