reproducibilityindex.ai

Meta-Learning for Simple Regret Minimization

Authors: Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we complement our theory with experiments (Section 7), which show the beneﬁts of meta-learning and conﬁrm that the Bayesian approaches are superior whenever implementable. In this section, we empirically compare our algorithms by their average meta simple regret over 100 simulation runs.
Researcher Affiliation	Collaboration	1University of Southern California 2Amazon 3Google Research azizim@usc.edu, bkveton@amazon.com, ghavamza@google.com, katsumee@amazon.com
Pseudocode	Yes	Algorithm 1: Bayesian Meta-SRM (B-meta SRM) and Algorithm 2: Frequentist Meta-SRM (f-meta SRM)
Open Source Code	No	The paper does not contain any explicit statement about making its code open source or provide a link to a code repository for its methodology.
Open Datasets	No	The paper mentions simulations and refers to a 'real-world dataset in Appendix F.1' but does not provide concrete access information (link, DOI, specific citation with author/year for public access) for any dataset used in the experiments.
Dataset Splits	No	The paper describes its evaluation based on '100 simulation runs' and interacting with 'm bandit problems with arm set A that appear sequentially', but does not provide specific train/validation/test dataset splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud instance specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	All experiments have m = 200 tasks with n = 100 rounds in each. Speciﬁcally, we assume that A = [K] are K arms with a Gaussian reward distribution νs(a; µs) = N(µs(a), 10^2), so σ = 10. The mean reward is sampled as µs ~ Pθ = N(θ , 0.12^2IK), so Σ0 = 0.12^2IK. The prior parameter is sampled from meta-prior as θ ~ Q = N(0K, IK), i.e., Σq = IK. We tune m0 and report the point-wise best performance for each task.