Combinatorial Multi-Armed Bandit with General Reward Functions

Authors: Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Even when we use the simple greedy algorithm as the oracle, our experiments show that SDCB performs significantly better than the algorithm in [26] (see the supplementary material).
Researcher Affiliation Collaboration Microsoft Research, email: weic@microsoft.com. The authors are listed in alphabetical order. Princeton University, email: huwei@cs.princeton.edu. The University of Texas at Austin, email: fuli.theory.research@gmail.com. Tsinghua University, email: lapordge@gmail.com. Tsinghua University, email: liuyujyyz@gmail.com. Shanghai University of Finance and Economics, email: lu.pinyan@mail.shufe.edu.cn.
Pseudocode Yes Algorithm 1 SDCB (Stochastically dominant confidence bound). Algorithm 2 Lazy-SDCB with known time horizon. Algorithm 3 Lazy-SDCB without knowing the time horizon.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper mentions applications like 'online advertising, online recommendation, wireless routing' and discusses 'K-MAX problem' and 'Expected Utility Maximization problems', but it does not specify any particular datasets used for its experiments or provide information on their public availability.
Dataset Splits No The paper does not provide specific details on train/validation/test dataset splits. While it mentions 'our experiments', it does not specify any dataset or how it was split.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., libraries, frameworks, or specific solvers).
Experiment Setup No The paper is theoretical and focuses on algorithm design and proofs. It does not provide specific experimental setup details such as hyperparameters, learning rates, batch sizes, or training configurations.