No-Regret Bandit Exploration based on Soft Tree Ensemble Model

Authors: Shogo Iwazaki, Shinya Suzumura

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our algorithm achieves a smaller cumulative regret compared to the existing Re LU-based neural bandit algorithms. We also show that this advantage comes with a trade-off: the hypothesis space of the soft tree ensemble model is more constrained than that of a Re LU-based neural network. In Sec. 4, we elucidate the distinctions in properties and assumptions between the existing NN-UCB and ST-UCB algorithms. Specifically, while NN-UCB generally lacks a no-regret guarantee in general action (or context) spaces, ST-UCB consistently offers a no-regret guarantee across general action spaces. Additionally, we examine the relation between the hypothesis spaces induced by the TNTK and those induced by the NTK using Re LU activation. This comparison reveals that the hypothesis space derived from soft trees, although more constrained, may lead to lower regret. In this section, we compare ST-UCB and NN-UCB to empirically demonstrate the usefulness of the tree-based model.
Researcher Affiliation Industry Shogo Iwazaki LY Corporation Tokyo, Japan siwazaki@lycorp.co.jp Shinya Suzumura LY Corporation Tokyo, Japan ssuzmura@lycorp.co.jp
Pseudocode Yes Algorithm 1 The soft tree-based upper confidence bound (ST-UCB) algorithm. Algorithm 2 Train ST (t, θ0, (xi, yi)i [t], J, η, D, α, M)
Open Source Code No We are not yet ready to release the required codes and will do so as soon as our paper is accepted.
Open Datasets Yes We use Energy Efficiency dataset [34] registered in UCI Machine Learning Repository [1].
Dataset Splits No The paper mentions creating a dataset of K arms and standardizing rewards, but it does not specify any training, validation, or test dataset splits. It only describes 'the response used for training the machine learning model'.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models) used to run its experiments.
Software Dependencies No The paper mentions using 'stochastic gradient descent (SGD)' and a 'fully connected neural network model' and specifies learning rates and momentum. However, it does not provide specific version numbers for any software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes We employ a fully connected neural network model with two intermediate layers. Each of the two intermediate layers contains 33 units, one of which is a bias term. As for the tree-based model, we consider an ensemble of four soft-trees, the depth of each soft-tree is three. The regularization coefficient λ for the parameters is fixed at 10 4. We train two models (ST, NN) using stochastic gradient descent (SGD) with a momentum term. The learning rate and the momentum are set to 0.01 and 0.9, respectively. SGD is performed in all rounds, with a mini-batch size of 64 and 5 epochs, and we do not use early stopping.