Combinatorial Neural Bandits
Authors: Taehyun Hwang, Kyuwook Chai, Min-Hwan Oh
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Numerical Experiments In this section, we perform numerical evaluations on CN-UCB and CN-TS. For each round in CN-TS, we draw M = 10 samples for each arm. We also present the performances of CN-TS(M=1), which is a special case of CN-TS drawing only one sample per arm. We perform synthetic experiments and measure the cumulative regret of each algorithm. |
| Researcher Affiliation | Academia | 1Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea. Correspondence to: Min-hwan Oh <minoh@snu.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 Combinatorial Neural UCB (CN-UCB) |
| Open Source Code | No | The paper does not contain any statement about releasing code or a link to a repository for the methodology. |
| Open Datasets | No | We perform synthetic experiments and measure the cumulative regret of each algorithm. In Experiment 1, we compare our algorithms with contextual combinatorial bandits based on a linear assumption: Comb Lin UCB and Comb Lin TS (Wen et al., 2015). In Experiment 2, we demonstrate the empirical performances of our algorithms as the context dimension d increases. The contexts given to the agent in each round are randomly generated from a unit ball. |
| Dataset Splits | No | The paper mentions training a neural network and parameters but does not specify any train/validation/test splits. It describes generating synthetic data. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models). |
| Software Dependencies | No | The activation function is the rectified linear unit (ReLU). We use the loss function in Eq.(4) and use stochastic gradient descent with a batch of 100 super arms. |
| Experiment Setup | Yes | We use regularization parameter λ = 1 for all methods, confidence bound coefficient α = 1 for Comb Lin UCB and γ = 1 for CN-UCB, and exploration variance ν = 1 for CN-TS, CN-TS(M=1) and Comb Lin TS. To estimate the score of each arm, we design a neural network with depth L = 2 and hidden layer width m = 100. The number of parameters is p = md + m = 8100 for Experiment 1, and p = 4100, 8100, 12100 for Experiment 2. The activation function is the rectified linear unit (ReLU). We use the loss function in Eq.(4) and use stochastic gradient descent with a batch of 100 super arms. We train the neural network every 10 rounds. The training epoch is 100, and the learning rate is 0.01. |