Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Meta-Learning for Simple Regret Minimization

Authors: Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

AAAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we complement our theory with experiments (Section 7), which show the benefits of meta-learning and confirm that the Bayesian approaches are superior whenever implementable. In this section, we empirically compare our algorithms by their average meta simple regret over 100 simulation runs.
Researcher Affiliation Collaboration 1University of Southern California 2Amazon 3Google Research EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Bayesian Meta-SRM (B-meta SRM) and Algorithm 2: Frequentist Meta-SRM (f-meta SRM)
Open Source Code No The paper does not contain any explicit statement about making its code open source or provide a link to a code repository for its methodology.
Open Datasets No The paper mentions simulations and refers to a 'real-world dataset in Appendix F.1' but does not provide concrete access information (link, DOI, specific citation with author/year for public access) for any dataset used in the experiments.
Dataset Splits No The paper describes its evaluation based on '100 simulation runs' and interacting with 'm bandit problems with arm set A that appear sequentially', but does not provide specific train/validation/test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup Yes All experiments have m = 200 tasks with n = 100 rounds in each. Specifically, we assume that A = [K] are K arms with a Gaussian reward distribution νs(a; µs) = N(µs(a), 10^2), so σ = 10. The mean reward is sampled as µs ~ Pθ = N(θ , 0.12^2IK), so Σ0 = 0.12^2IK. The prior parameter is sampled from meta-prior as θ ~ Q = N(0K, IK), i.e., Σq = IK. We tune m0 and report the point-wise best performance for each task.