Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

Authors: Dhruv Malik, Conor Igoe, Yuanzhi Li, Aarti Singh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Numerical Results In this section, we evaluate the performance of Algorithm 1 (denoted SE), in different domains which are modeled as WTB problems satisfying REO. In each domain, we compare this performance to the following baselines (A) The EXP3 algorithm (Auer et al., 2002b), which has sublinear traditional regret in our setting (B) The batched version of EXP3 described by Arora et al. (2012), denoted as EXP3B, which has sublinear CPR in our setting (C) The modified UCB algorithm described in Section 3.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA. Correspondence to: Dhruv Malik <dhruvm@andrew.cmu.edu>.
Pseudocode Yes Algorithm 1 Successive Elimination for WTB with REO
Open Source Code No The paper does not provide any links to open-source code or explicitly state that code for the methodology is being released.
Open Datasets No The paper uses synthetic losses and simulates a dart throwing tournament. It does not provide access information (link, citation, or repository name) for any publicly available or open dataset. For example, in Section 4.3 it states: "we simulate a simplified dart throwing tournament with K = 20 players."
Dataset Splits No The paper describes numerical simulations and a simulated dart tournament for online learning, which are not typically evaluated using fixed train/validation/test splits. No specific dataset split information (percentages, counts, or predefined splits) is provided for reproducibility of data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or cloud instances.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks) used in the experiments.
Experiment Setup Yes Algorithm 1 Successive Elimination for WTB with REO Require: upper bound M on memory capacity m, time horizon T, failure probability tolerance δ (0, 1)... In (a) we fix K = 5, m = 3 and M = 3. In (b) we fix m = 4 and T = 106. ... with a choice of m = 4, K = 5, M = 4.