Adaptive Learning with Unknown Information Flows
Authors: Yonatan Gur, Ahmadreza Momeni
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we introduce a new, generalized MAB formulation in which additional information on each arm may appear arbitrarily throughout the decision horizon, and study the impact of such information flows on the achievable performance and the design of efficient decision-making policies. By obtaining matching lower and upper bounds, we characterize the (regret) complexity of this family of MAB problems as a function of the information flows. |
| Researcher Affiliation | Academia | Yonatan Gur Graduate School of Business Stanford University Stanford, CA 94305 ygur@stanford.edu Ahmadreza Momeni Electrical Engineering Department Stanford University Stanford, CA 94305 amomenis@stanford.edu |
| Pseudocode | Yes | Adaptive exploration policy. Input: a tuning parameter c > 0. 1. Set initial virtual times τk,0 = 0 for all k K, and an exploration set W0 = K. 2. At each period t = 1, 2, . . . , T: (a) Observe the vectors ηt, and Zt. Advance virtual times: τk,t = (τk,t 1 + 1) exp ηk,t 2 cσ2 for all k K Update the exploration set: Wt = n k K | nk,t < cσ2 2 log τk,t o (b) If Wt is not empty, select an arm from Wt with the fewest observations: (exploration) πt = arg min k Wt nk,t. Otherwise, Select an arm with the highest estimated reward: (exploitation) πt = arg max k K Xk,nk,t. (c) Receive and observe a reward Xπt,t |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes a theoretical model and does not mention the use or public availability of any specific dataset for training. |
| Dataset Splits | No | The paper does not mention any specific dataset splits for validation. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper describes a theoretical policy and does not detail any empirical experiment setup with hyperparameters or training configurations. |