Expected Improvement for Contextual Bandits

Authors: Hung Tran-The, Sunil Gupta, Santu Rana, Tuan Truong, Long Tran-Thanh, Svetha Venkatesh

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on various benchmark datasets show that both proposed algorithms work well and consistently outperform existing approaches, especially in high dimensions.
Researcher Affiliation Collaboration Hung Tran-The Applied Artificial Intelligence Institute Deakin University Sunil Gupta Applied Artificial Intelligence Institute Deakin University Santu Sana Applied Artificial Intelligence Institute Deakin University Tuan Truong FPT Software AI Center Long Tran-Thanh University of Warwick, UK Svetha Venkatesh Applied Artificial Intelligence Institute Deakin University
Pseudocode Yes Algorithm 1 The Linear Expected Improvement Algorithm (Lin EI) and Algorithm 2 Neural Expected Improvement Algorithm (Neural EI)
Open Source Code Yes Our source code is publicly available at https://github.com/Tran-The Hung/Expected-Improvement-for-Contextual-Bandits.
Open Datasets Yes We assess the performance of our Lin EI algorithm on several benchmark datasets including covertype, magic, avila, dry bean, statlog, letter, pendigits, all from UCI [16]. Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
Dataset Splits No The paper does not explicitly state training/validation/test splits. It mentions that 'the cumulative regret over time horizon T is measured by the total mistakes made by the algorithm', implying sequential processing typical for bandit problems rather than distinct data splits. It also mentions a grid search for hyperparameters 'to select the parameter with the best performance', which implies a validation process, but without explicit split details.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions using neural networks and gradient descent, but does not provide specific software dependencies with version numbers, such as Python version, PyTorch version, or other library versions.
Experiment Setup Yes We set the time horizon of our algorithm to 10000 for all data sets. For Lin EI, we set C0 = d and β = 2. For methods using the neural network, we use one-hidden layer neural networks with 100 neurons to model the reward function. During posterior updating, gradient descent is run for 100 iterations with learning rate 0.001.