Expected Improvement for Contextual Bandits
Authors: Hung Tran-The, Sunil Gupta, Santu Rana, Tuan Truong, Long Tran-Thanh, Svetha Venkatesh
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on various benchmark datasets show that both proposed algorithms work well and consistently outperform existing approaches, especially in high dimensions. |
| Researcher Affiliation | Collaboration | Hung Tran-The Applied Artificial Intelligence Institute Deakin University Sunil Gupta Applied Artificial Intelligence Institute Deakin University Santu Sana Applied Artificial Intelligence Institute Deakin University Tuan Truong FPT Software AI Center Long Tran-Thanh University of Warwick, UK Svetha Venkatesh Applied Artificial Intelligence Institute Deakin University |
| Pseudocode | Yes | Algorithm 1 The Linear Expected Improvement Algorithm (Lin EI) and Algorithm 2 Neural Expected Improvement Algorithm (Neural EI) |
| Open Source Code | Yes | Our source code is publicly available at https://github.com/Tran-The Hung/Expected-Improvement-for-Contextual-Bandits. |
| Open Datasets | Yes | We assess the performance of our Lin EI algorithm on several benchmark datasets including covertype, magic, avila, dry bean, statlog, letter, pendigits, all from UCI [16]. Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. |
| Dataset Splits | No | The paper does not explicitly state training/validation/test splits. It mentions that 'the cumulative regret over time horizon T is measured by the total mistakes made by the algorithm', implying sequential processing typical for bandit problems rather than distinct data splits. It also mentions a grid search for hyperparameters 'to select the parameter with the best performance', which implies a validation process, but without explicit split details. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using neural networks and gradient descent, but does not provide specific software dependencies with version numbers, such as Python version, PyTorch version, or other library versions. |
| Experiment Setup | Yes | We set the time horizon of our algorithm to 10000 for all data sets. For Lin EI, we set C0 = d and β = 2. For methods using the neural network, we use one-hidden layer neural networks with 100 neurons to model the reward function. During posterior updating, gradient descent is run for 100 iterations with learning rate 0.001. |