Meta-Learning for Simple Regret Minimization
Authors: Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we complement our theory with experiments (Section 7), which show the benefits of meta-learning and confirm that the Bayesian approaches are superior whenever implementable. In this section, we empirically compare our algorithms by their average meta simple regret over 100 simulation runs. |
| Researcher Affiliation | Collaboration | 1University of Southern California 2Amazon 3Google Research azizim@usc.edu, bkveton@amazon.com, ghavamza@google.com, katsumee@amazon.com |
| Pseudocode | Yes | Algorithm 1: Bayesian Meta-SRM (B-meta SRM) and Algorithm 2: Frequentist Meta-SRM (f-meta SRM) |
| Open Source Code | No | The paper does not contain any explicit statement about making its code open source or provide a link to a code repository for its methodology. |
| Open Datasets | No | The paper mentions simulations and refers to a 'real-world dataset in Appendix F.1' but does not provide concrete access information (link, DOI, specific citation with author/year for public access) for any dataset used in the experiments. |
| Dataset Splits | No | The paper describes its evaluation based on '100 simulation runs' and interacting with 'm bandit problems with arm set A that appear sequentially', but does not provide specific train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | All experiments have m = 200 tasks with n = 100 rounds in each. Specifically, we assume that A = [K] are K arms with a Gaussian reward distribution νs(a; µs) = N(µs(a), 10^2), so σ = 10. The mean reward is sampled as µs ~ Pθ = N(θ , 0.12^2IK), so Σ0 = 0.12^2IK. The prior parameter is sampled from meta-prior as θ ~ Q = N(0K, IK), i.e., Σq = IK. We tune m0 and report the point-wise best performance for each task. |