reproducibilityindex.ai

No-Regret Bandit Exploration based on Soft Tree Ensemble Model

Authors: Shogo Iwazaki, Shinya Suzumura

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our algorithm achieves a smaller cumulative regret compared to the existing Re LU-based neural bandit algorithms. We also show that this advantage comes with a trade-off: the hypothesis space of the soft tree ensemble model is more constrained than that of a Re LU-based neural network. In Sec. 4, we elucidate the distinctions in properties and assumptions between the existing NN-UCB and ST-UCB algorithms. Specifically, while NN-UCB generally lacks a no-regret guarantee in general action (or context) spaces, ST-UCB consistently offers a no-regret guarantee across general action spaces. Additionally, we examine the relation between the hypothesis spaces induced by the TNTK and those induced by the NTK using Re LU activation. This comparison reveals that the hypothesis space derived from soft trees, although more constrained, may lead to lower regret. In this section, we compare ST-UCB and NN-UCB to empirically demonstrate the usefulness of the tree-based model.
Researcher Affiliation	Industry	Shogo Iwazaki LY Corporation Tokyo, Japan siwazaki@lycorp.co.jp Shinya Suzumura LY Corporation Tokyo, Japan ssuzmura@lycorp.co.jp
Pseudocode	Yes	Algorithm 1 The soft tree-based upper confidence bound (ST-UCB) algorithm. Algorithm 2 Train ST (t, θ0, (xi, yi)i [t], J, η, D, α, M)
Open Source Code	No	We are not yet ready to release the required codes and will do so as soon as our paper is accepted.
Open Datasets	Yes	We use Energy Efficiency dataset [34] registered in UCI Machine Learning Repository [1].
Dataset Splits	No	The paper mentions creating a dataset of K arms and standardizing rewards, but it does not specify any training, validation, or test dataset splits. It only describes 'the response used for training the machine learning model'.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models) used to run its experiments.
Software Dependencies	No	The paper mentions using 'stochastic gradient descent (SGD)' and a 'fully connected neural network model' and specifies learning rates and momentum. However, it does not provide specific version numbers for any software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup	Yes	We employ a fully connected neural network model with two intermediate layers. Each of the two intermediate layers contains 33 units, one of which is a bias term. As for the tree-based model, we consider an ensemble of four soft-trees, the depth of each soft-tree is three. The regularization coefficient λ for the parameters is fixed at 10 4. We train two models (ST, NN) using stochastic gradient descent (SGD) with a momentum term. The learning rate and the momentum are set to 0.01 and 0.9, respectively. SGD is performed in all rounds, with a mini-batch size of 64 and 5 epochs, and we do not use early stopping.