Combinatorial Neural Bandits

Authors: Taehyun Hwang, Kyuwook Chai, Min-Hwan Oh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Numerical Experiments In this section, we perform numerical evaluations on CN-UCB and CN-TS. For each round in CN-TS, we draw M = 10 samples for each arm. We also present the performances of CN-TS(M=1), which is a special case of CN-TS drawing only one sample per arm. We perform synthetic experiments and measure the cumulative regret of each algorithm.
Researcher Affiliation Academia 1Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea. Correspondence to: Min-hwan Oh <minoh@snu.ac.kr>.
Pseudocode Yes Algorithm 1 Combinatorial Neural UCB (CN-UCB)
Open Source Code No The paper does not contain any statement about releasing code or a link to a repository for the methodology.
Open Datasets No We perform synthetic experiments and measure the cumulative regret of each algorithm. In Experiment 1, we compare our algorithms with contextual combinatorial bandits based on a linear assumption: Comb Lin UCB and Comb Lin TS (Wen et al., 2015). In Experiment 2, we demonstrate the empirical performances of our algorithms as the context dimension d increases. The contexts given to the agent in each round are randomly generated from a unit ball.
Dataset Splits No The paper mentions training a neural network and parameters but does not specify any train/validation/test splits. It describes generating synthetic data.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models).
Software Dependencies No The activation function is the rectified linear unit (ReLU). We use the loss function in Eq.(4) and use stochastic gradient descent with a batch of 100 super arms.
Experiment Setup Yes We use regularization parameter λ = 1 for all methods, confidence bound coefficient α = 1 for Comb Lin UCB and γ = 1 for CN-UCB, and exploration variance ν = 1 for CN-TS, CN-TS(M=1) and Comb Lin TS. To estimate the score of each arm, we design a neural network with depth L = 2 and hidden layer width m = 100. The number of parameters is p = md + m = 8100 for Experiment 1, and p = 4100, 8100, 12100 for Experiment 2. The activation function is the rectified linear unit (ReLU). We use the loss function in Eq.(4) and use stochastic gradient descent with a batch of 100 super arms. We train the neural network every 10 rounds. The training epoch is 100, and the learning rate is 0.01.