reproducibilityindex.ai

Adaptive Learning with Unknown Information Flows

Authors: Yonatan Gur, Ahmadreza Momeni

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper we introduce a new, generalized MAB formulation in which additional information on each arm may appear arbitrarily throughout the decision horizon, and study the impact of such information ﬂows on the achievable performance and the design of efﬁcient decision-making policies. By obtaining matching lower and upper bounds, we characterize the (regret) complexity of this family of MAB problems as a function of the information ﬂows.
Researcher Affiliation	Academia	Yonatan Gur Graduate School of Business Stanford University Stanford, CA 94305 ygur@stanford.edu Ahmadreza Momeni Electrical Engineering Department Stanford University Stanford, CA 94305 amomenis@stanford.edu
Pseudocode	Yes	Adaptive exploration policy. Input: a tuning parameter c > 0. 1. Set initial virtual times τk,0 = 0 for all k K, and an exploration set W0 = K. 2. At each period t = 1, 2, . . . , T: (a) Observe the vectors ηt, and Zt. Advance virtual times: τk,t = (τk,t 1 + 1) exp ηk,t 2 cσ2 for all k K Update the exploration set: Wt = n k K \| nk,t < cσ2 2 log τk,t o (b) If Wt is not empty, select an arm from Wt with the fewest observations: (exploration) πt = arg min k Wt nk,t. Otherwise, Select an arm with the highest estimated reward: (exploitation) πt = arg max k K Xk,nk,t. (c) Receive and observe a reward Xπt,t
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The paper describes a theoretical model and does not mention the use or public availability of any specific dataset for training.
Dataset Splits	No	The paper does not mention any specific dataset splits for validation.
Hardware Specification	No	The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper describes a theoretical policy and does not detail any empirical experiment setup with hyperparameters or training configurations.