Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits
Authors: Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this experiment, the objective is to show the effect of the smallest suboptimality gap ∆min over theoretical gap-dependent regret upper bounds for ESCB-C and OLS-UCB-C. To that end, we sampled 100 environments with different ∆min, with a constant number of items d = 20, a horizon of T = 105 rounds, and randomly sampled structures. We represent theoretical upper bounds with respect to 1/∆min in Fig. 1. We evaluate ESCB-C (approximated as proposed in Perrault et al., 2020b) and OLS-UCB-C on d = 5 items, P = 10 actions, T = 105 rounds and randomly sampled structures. We represent the pseudo-regret evolutions in Fig. 2. The evolutions remain the same until 103 rounds. After that, ESCB-C seemingly performs better than OLS-UCB-C which has a supplementary log(t) factor and is more conservative. However, just before 105 rounds, we can observe a slight regime change for ESCB-C while the pseudo-regret of OLS-UCB-C continues to increase smoothly. The average regret of ESCB-C seems to have an inflexion point upward to meet the q75 curve. |
| Researcher Affiliation | Collaboration | Julien Zhou Criteo AI Lab Paris, France Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France julien.zhou@inria.fr Pierre Gaillard Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France Thibaud Rahier Criteo AI Lab, Paris, France Houssam Zenati Université Paris-Saclay, Inria, Palaiseau, France Julyan Arbel Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France |
| Pseudocode | Yes | Algorithm 2 OLS-UCB-C; Algorithm 3 COS-V |
| Open Source Code | No | The paper does not contain any statement about releasing the code for OLS-UCB-C or COS-V, nor does it provide a link to a code repository. The NeurIPS checklist confirms: "The paper does not include experiments requiring code." |
| Open Datasets | No | The experimental section mentions "sampled 100 environments" and "randomly sampled structures", indicating synthetic or custom-generated data. However, it does not provide any link, DOI, or formal citation for public access to these environments or structures, nor does it refer to a well-known public dataset. |
| Dataset Splits | No | The paper describes experiments but does not specify any training/validation/test dataset splits. For bandit problems, data splitting is typically not done in the same way as supervised learning, but if it were to be reproducible, details on how data was generated or used in phases would be needed. |
| Hardware Specification | No | The paper includes an "Experimental results" section but does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running these experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers used for its experiments or implementations. The NeurIPS checklist states "The paper does not include experiments." which contradicts its "Experimental Results" section, but confirms lack of software details. |
| Experiment Setup | No | The paper states "We evaluate ESCB-C (approximated as proposed in Perrault et al., 2020b) and OLS-UCB-C on d = 5 items, P = 10 actions, T = 105 rounds and randomly sampled structures." These are parameters of the simulation environment, but not specific hyperparameters or system-level training settings for the algorithms themselves, such as learning rate, batch size, or optimizer. |