reproducibilityindex.ai

From Finite to Countable-Armed Bandits

Authors: Anand Kalvit, Assaf Zeevi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also demonstrate empirically that a key concentration property of UCB1 that is pivotal to the aforementioned regret guarantee, is violated for Thompson Sampling (TS) and therefore, ALG(TS, Θ, 2) suffers linear regret. A formal statement of said concentration property of UCB1 is deferred to 4. The regret upper bound of ALG(UCB1, Θ, 2) is stated next in Theorem 3. Following is an auxiliary proposition that is useful towards Theorem 3. Empirical illustration. Figure 2 shows the histogram of the fraction of time a particular arm of a two-armed bandit having statistically identical arms with Bernoulli(0.5) rewards each was played under different algorithms. The leftmost plot corresponds to UCB1 and is evidently in consonance with the concentration property stated in part (i) of Theorem 4. The concentration phenomenon under UCB1 can be understood through the lens of reward stochasticity. Consider the simplest case where the rewards are deterministic. Then, we know from the structure of UCB1 that any arm is played at most twice before the algorithm switches over to the other arm. This results in N1(n)/n converging to 1/2 pathwise, with an arm switch-over time that is at most 2. As the reward stochasticity increases, so does the arm switch-over time, which adversely affects this convergence. While it is a priori unclear whether N1(n)/n would still converge to 1/2 in some mode if the rewards are stochastic, part (ii) of Theorem 4 states that the convergence indeed holds, albeit only in probability. A signiﬁcant spread around 1/2 in the leftmost plot despite n = 104 plays indicates a rather slow convergence.
Researcher Affiliation	Academia	Anand Kalvit1 and Assaf Zeevi2 Graduate School of Business Columbia University New York, USA {1akalvit22,2assaf}@gsb.columbia.edu
Pseudocode	Yes	Algorithm 1 ETC(2): ETC for an inﬁnite population of arms with \|T \| = 2. ... Algorithm 2 ALG(Ξ, Θ, 2): An algorithmic framework for countable-armed bandits with \|T \| = 2.
Open Source Code	No	The paper does not provide any statements about releasing source code or links to a code repository for their methodology.
Open Datasets	No	The paper describes using synthetic reward distributions such as "Bernoulli(µi) rewards", "Gaussian(0.5,1) rewards", and "Bernoulli(0.5) rewards" for simulations. It does not use or provide access to a pre-existing, publicly available dataset in the conventional sense (e.g., CIFAR-10, MNIST).
Dataset Splits	No	The paper does not describe specific training, validation, or test dataset splits. It relies on theoretical analysis and simulations where data is generated from distributions, rather than processed from fixed datasets with predefined splits.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the simulations or computations.
Software Dependencies	No	The paper mentions algorithms like UCB1 and Thompson Sampling, but does not specify any software libraries, frameworks, or their version numbers that would be needed to reproduce the work.
Experiment Setup	Yes	Theorem 3: "where m0 0 and γ > 2 are user-deﬁned parameters that ensure Θ satisﬁes the conditions of Proposition 1 (for example, m0 = 11 and γ = 2.1 is an acceptable conﬁguration)." Figure 1: "Monte-Carlo estimates of β plotted against using (2) with m0 = 4000 and γ = 2.1." Figure 2: "Bernoulli(0.5) rewards: Histogram of the fraction of plays of arm 1 until time n = 10,000, i.e., N1 104 /104, under three different algorithms. Number of replications under each algorithm ℵ= 20,000."