Adaptive Learning with Unknown Information Flows

Authors: Yonatan Gur, Ahmadreza Momeni

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper we introduce a new, generalized MAB formulation in which additional information on each arm may appear arbitrarily throughout the decision horizon, and study the impact of such information flows on the achievable performance and the design of efficient decision-making policies. By obtaining matching lower and upper bounds, we characterize the (regret) complexity of this family of MAB problems as a function of the information flows.
Researcher Affiliation Academia Yonatan Gur Graduate School of Business Stanford University Stanford, CA 94305 ygur@stanford.edu Ahmadreza Momeni Electrical Engineering Department Stanford University Stanford, CA 94305 amomenis@stanford.edu
Pseudocode Yes Adaptive exploration policy. Input: a tuning parameter c > 0. 1. Set initial virtual times τk,0 = 0 for all k K, and an exploration set W0 = K. 2. At each period t = 1, 2, . . . , T: (a) Observe the vectors ηt, and Zt. Advance virtual times: τk,t = (τk,t 1 + 1) exp ηk,t 2 cσ2 for all k K Update the exploration set: Wt = n k K | nk,t < cσ2 2 log τk,t o (b) If Wt is not empty, select an arm from Wt with the fewest observations: (exploration) πt = arg min k Wt nk,t. Otherwise, Select an arm with the highest estimated reward: (exploitation) πt = arg max k K Xk,nk,t. (c) Receive and observe a reward Xπt,t
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No The paper describes a theoretical model and does not mention the use or public availability of any specific dataset for training.
Dataset Splits No The paper does not mention any specific dataset splits for validation.
Hardware Specification No The paper is theoretical and does not mention any specific hardware used for experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper describes a theoretical policy and does not detail any empirical experiment setup with hyperparameters or training configurations.