Online Learning with Bounded Recall

Authors: Jon Schneider, Kiran Vodrahalli

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct some experiments to assess the performance of the bounded-recall learning algorithms we introduce in some simple settings.
Researcher Affiliation Industry 1Google. Correspondence to: Jon Schneider <jschnei@google.com>, Kiran Vodrahalli <kirannv@google.com>.
Pseudocode Yes Algorithm 1 PERIODICRESTART; Algorithm 3 RANDOMIZEDAVERAGERESTART; Algorithm 4 AVERAGERESTARTFULLHORIZON
Open Source Code No The paper does not provide explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper uses simulated environments rather than publicly available datasets. For example, 'In the first environment, we simulate periodic drifting rewards...' and 'In the second environment, we simulate the adversarial example from the proof of Lemma 4.2 (with M = T/3).'
Dataset Splits No The paper conducts simulations in synthetic environments and does not describe standard dataset splits (training, validation, test) with specific percentages or counts.
Hardware Specification No The paper conducts simulations but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions using the 'Multiplicative Weights Update algorithm' but does not provide specific software names with version numbers for implementation, such as programming languages, libraries, or frameworks.
Experiment Setup Yes In particular, we consider Algorithms 1 (PERIODICRESTART), 3 (AVERAGERESTART), 4 (AVERAGERESTARTFULLHORIZON), and an M-bounded-recall mean-based algorithm, all based off of the Multiplicative Weights Update algorithm with fixed learning rate η = 1/2. All algorithms we simulate in the first environment have M = 0.15T. In the second environment, we simulate the adversarial example from the proof of Lemma 4.2 (with M = T/3). In Figure 2, we plot the cumulative regret over time for both environments, averaged over ten independent runs with T = 1000.