Randomized Confidence Bounds for Stochastic Partial Monitoring
Authors: Maxime Heuillet, Ola Ahmad, Audrey Durand
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed Rand CBP and Rand CBPside strategies have competitive performance against stateof-the-art baselines in multiple PM games. 5. Numerical Experiments We conduct experiments to validate the empirical performance of Rand CBP and Rand CBPside on the well-known Apple Tasting (AT) (Helmbold et al., 2000) (further studied in (Raman et al., 2024)) and Label Efficient (LE) (Helmbold et al., 1997) games. |
| Researcher Affiliation | Collaboration | 1Universit e Laval, Canada 2Thales Research and Technology (cort AIx), Canada 3Canada-CIFAR AI Chair, Mila, Canada. |
| Pseudocode | Yes | Algorithm 1 CBP (Bart ok et al., 2012b) and Rand CPB; Algorithm 2 Randomization Procedure; Algorithm 3 CBPside (Lienert, 2013) and Rand CPBside |
| Open Source Code | Yes | Our paper is the first to provide extensive reproducibility resources (open-source code for all strategies and environments, and game analyses in the Appendix) to facilitate future applied developments. Code is available at https://github.com/MaxHeuillet/partial-monitoring-algos. |
| Open Datasets | Yes | We conduct experiments to validate the empirical performance of Rand CBP and Rand CBPside on the well-known Apple Tasting (AT) (Helmbold et al., 2000) (further studied in (Raman et al., 2024)) and Label Efficient (LE) (Helmbold et al., 1997) games. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits. It describes generating contexts uniformly and running experiments over a T=20k horizon, but no specific dataset partitioning for train/validation is mentioned. |
| Hardware Specification | Yes | Contextual and non-contextual experiments are run on machines with 48 CPUs which justifies why we consider 96 runs rather than 100 (48 2 = 96 is the optimal allocation). |
| Software Dependencies | No | The paper mentions using "Gurobi (Gurobi Optimization, LLC, 2023) or PULP (Mitchell et al., 2011)" but does not provide explicit version numbers for these or other key software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | The number of samples for BPM-Least, TSPM and TSPM-Gaussian is set to 100. The strategies TSPM and TSPM-Gaussian are set with λ = 0.01... To compare CBP and Rand CBP fairly, both strategies are set with α = 1.01. Sampling in Rand CBP is performed according to the procedure described in Section 3.2 over K = 5 bins, with probability ε = 10 7 on the tail and standard deviation σ = 1... All contextual approaches use a regularization λ = 0.05. |