reproducibilityindex.ai

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

Authors: Ofir Nabati, Tom Zahavy, Shie Mannor

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied our algorithm, Limited Memory Neural-Linear with Likelihood Matching (Neural Linear-Li M2) on a variety of datasets and observed that our algorithm achieves comparable performance to the unlimited memory approach while exhibits resilience to catastrophic forgetting. We performed experiments on several real-world and simulated datasets, using Multi-Layered Perceptrons (MLPs).
Researcher Affiliation	Collaboration	1Department of Electrical-Engineering, Technion Institute of Technology, Israel 2Deep Mind 3Nvidia Research.
Pseudocode	Yes	Algorithm 1 TS for linear contextual bandits, Algorithm 2 Limited Memory Neural-Linear TS with Likelihood Matching (Neural Linear-Li M2), Algorithm 3 Projected Gradient Decent (PGD)
Open Source Code	Yes	A code of our algorithm is based on the code provided by Riquelme et al. (2018) and is available online 1. 1Code is available at Git Hub
Open Datasets	Yes	All of these datasets are publicly available through the UCI Machine Learning Repository 2. 2https://archive.ics.uci.edu/ml/index.php
Dataset Splits	No	Every iteration, we train f for P minibatches. Training is performed by sampling experience tuples {b(τ), a(τ), ra(τ)(τ)} from the replay buffer E (details below) and minimizing the mean squared error (MSE). No explicit train/validation/test splits with percentages or counts are provided for the datasets.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions types of neural networks and optimization methods (e.g., MLPs, CNNs, SGD), but it does not list specific software libraries or frameworks with their version numbers.
Experiment Setup	Yes	In all the experiments, we used the same hyperparameters as in Riquelme et al. (2018). E.g., the network architecture is an MLP with a single hidden layer of size 50. The size of the memory buffer is set to be 100 per action for the limited memory algorithms, and the batch size is set to be 16 times the number of actions. The initial learning rate for both the DNN training and the moments matching was set to 0.01 with a decaying factor of 1/t.