Bayesian decision-making under misspecified priors with applications to meta-learning
Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel J. Hsu, Thodoris Lykouris, Miro Dudik, Robert E. Schapire
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors. |
| Researcher Affiliation | Collaboration | Massachusetts Institute of Technology, msimchow@mit.edu Columbia University Microsoft Research NYC Massachusetts Institute of Technology |
| Pseudocode | No | The paper describes algorithms in text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper mentions using synthetic data for experiments (e.g., 'synthetic experiments in multi-armed and contextual bandit settings', 'synthetic linear contextual bandit problem') but does not provide access information (link, DOI, or citation) for these datasets, nor does it refer to well-known public datasets. |
| Dataset Splits | No | The paper describes its experimental setup and meta-learning approach but does not specify dataset splits (e.g., percentages or counts) for training, validation, or testing. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | Our first scenario is a multi-armed bandit problem with Gaussian prior and Gaussian rewards. The instance has |A| = 6 arms and each episode has horizon H = 10. The prior is N(ν , Ψ ) where ν = [0.5, 0, 0, 0.1, 0, 0] and Ψ has block structure... The rewards are Gaussian with variance 1... Both meta-learners are run in an explore-then-commit fashion where the first T0 episodes are used for exploration. |