Online Learning with Bounded Recall
Authors: Jon Schneider, Kiran Vodrahalli
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct some experiments to assess the performance of the bounded-recall learning algorithms we introduce in some simple settings. |
| Researcher Affiliation | Industry | 1Google. Correspondence to: Jon Schneider <jschnei@google.com>, Kiran Vodrahalli <kirannv@google.com>. |
| Pseudocode | Yes | Algorithm 1 PERIODICRESTART; Algorithm 3 RANDOMIZEDAVERAGERESTART; Algorithm 4 AVERAGERESTARTFULLHORIZON |
| Open Source Code | No | The paper does not provide explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper uses simulated environments rather than publicly available datasets. For example, 'In the first environment, we simulate periodic drifting rewards...' and 'In the second environment, we simulate the adversarial example from the proof of Lemma 4.2 (with M = T/3).' |
| Dataset Splits | No | The paper conducts simulations in synthetic environments and does not describe standard dataset splits (training, validation, test) with specific percentages or counts. |
| Hardware Specification | No | The paper conducts simulations but does not specify any hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'Multiplicative Weights Update algorithm' but does not provide specific software names with version numbers for implementation, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | In particular, we consider Algorithms 1 (PERIODICRESTART), 3 (AVERAGERESTART), 4 (AVERAGERESTARTFULLHORIZON), and an M-bounded-recall mean-based algorithm, all based off of the Multiplicative Weights Update algorithm with fixed learning rate η = 1/2. All algorithms we simulate in the first environment have M = 0.15T. In the second environment, we simulate the adversarial example from the proof of Lemma 4.2 (with M = T/3). In Figure 2, we plot the cumulative regret over time for both environments, averaged over ten independent runs with T = 1000. |