Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Meta-Learning for Simple Regret Minimization
Authors: Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we complement our theory with experiments (Section 7), which show the benefits of meta-learning and confirm that the Bayesian approaches are superior whenever implementable. In this section, we empirically compare our algorithms by their average meta simple regret over 100 simulation runs. |
| Researcher Affiliation | Collaboration | 1University of Southern California 2Amazon 3Google Research EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Bayesian Meta-SRM (B-meta SRM) and Algorithm 2: Frequentist Meta-SRM (f-meta SRM) |
| Open Source Code | No | The paper does not contain any explicit statement about making its code open source or provide a link to a code repository for its methodology. |
| Open Datasets | No | The paper mentions simulations and refers to a 'real-world dataset in Appendix F.1' but does not provide concrete access information (link, DOI, specific citation with author/year for public access) for any dataset used in the experiments. |
| Dataset Splits | No | The paper describes its evaluation based on '100 simulation runs' and interacting with 'm bandit problems with arm set A that appear sequentially', but does not provide specific train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | All experiments have m = 200 tasks with n = 100 rounds in each. Specifically, we assume that A = [K] are K arms with a Gaussian reward distribution νs(a; µs) = N(µs(a), 10^2), so σ = 10. The mean reward is sampled as µs ~ Pθ = N(θ , 0.12^2IK), so Σ0 = 0.12^2IK. The prior parameter is sampled from meta-prior as θ ~ Q = N(0K, IK), i.e., Σq = IK. We tune m0 and report the point-wise best performance for each task. |