Online Limited Memory Neural-Linear Bandits with Likelihood Matching
Authors: Ofir Nabati, Tom Zahavy, Shie Mannor
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied our algorithm, Limited Memory Neural-Linear with Likelihood Matching (Neural Linear-Li M2) on a variety of datasets and observed that our algorithm achieves comparable performance to the unlimited memory approach while exhibits resilience to catastrophic forgetting. We performed experiments on several real-world and simulated datasets, using Multi-Layered Perceptrons (MLPs). |
| Researcher Affiliation | Collaboration | 1Department of Electrical-Engineering, Technion Institute of Technology, Israel 2Deep Mind 3Nvidia Research. |
| Pseudocode | Yes | Algorithm 1 TS for linear contextual bandits, Algorithm 2 Limited Memory Neural-Linear TS with Likelihood Matching (Neural Linear-Li M2), Algorithm 3 Projected Gradient Decent (PGD) |
| Open Source Code | Yes | A code of our algorithm is based on the code provided by Riquelme et al. (2018) and is available online 1. 1Code is available at Git Hub |
| Open Datasets | Yes | All of these datasets are publicly available through the UCI Machine Learning Repository 2. 2https://archive.ics.uci.edu/ml/index.php |
| Dataset Splits | No | Every iteration, we train f for P minibatches. Training is performed by sampling experience tuples {b(τ), a(τ), ra(τ)(τ)} from the replay buffer E (details below) and minimizing the mean squared error (MSE). No explicit train/validation/test splits with percentages or counts are provided for the datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions types of neural networks and optimization methods (e.g., MLPs, CNNs, SGD), but it does not list specific software libraries or frameworks with their version numbers. |
| Experiment Setup | Yes | In all the experiments, we used the same hyperparameters as in Riquelme et al. (2018). E.g., the network architecture is an MLP with a single hidden layer of size 50. The size of the memory buffer is set to be 100 per action for the limited memory algorithms, and the batch size is set to be 16 times the number of actions. The initial learning rate for both the DNN training and the moments matching was set to 0.01 with a decaying factor of 1/t. |