reproducibilityindex.ai

Multi-Objective Generalized Linear Bandits

Authors: Shiyin Lu, Guanghui Wang, Yao Hu, Lijun Zhang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments demonstrate the effectiveness of our method. In this section, we conduct numerical experiments to compare our algorithm with the following multi-objective bandits algorithms. As can be seen from Fig. 1, where the vertical axis represents the cumulative Pareto regret up to round t, our algorithm signiﬁcantly outperforms its competitors in all experiments.
Researcher Affiliation	Collaboration	Shiyin Lu1 , Guanghui Wang1 , Yao Hu2 and Lijun Zhang1 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2You Ku Cognitive and Intelligent Lab, Alibaba Group, Beijing 100102, China {lusy, wanggh, zhanglj}@lamda.nju.edu.cn, yaoohu@alibaba-inc.com
Pseudocode	Yes	Algorithm 1 MOGLB-UCB
Open Source Code	No	The paper does not provide a direct link to open-source code or explicitly state that the code is publicly available.
Open Datasets	No	We use a synthetic dataset constructed as follows. For each objective i [m], we sample the coefﬁcients θi uniformly from the positive part of the unit ball. To control the size of the Pareto front, we generate the arm set comprised of 4d arms as follows. We ﬁrst draw 3d arms uniformly from the centered ball whose radius is 0.5, and then sample d arms uniformly from the centered unit ball. We repeat this process until the size of the Pareto front is not more than d.
Dataset Splits	No	The paper uses a synthetic dataset but does not specify explicit training, validation, or test splits, nor does it mention cross-validation.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions two models (probit model and logit model) used for generating reward components, but does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	In our algorithm, there is a parameter λ. Since its functionality is just to make Zt invertible and our algorithm is insensitive to it, we simply set λ = max(1, κ/2). Following common practice in bandits learning [Zhang et al., 2016; Jun et al., 2017], we also tune the width of the conﬁdence set γt as c log det (Zt) / det (Z1), where c is searched within [1e 3, 1]. Let m = 5 and pick d from {5, 10, 15}. For each objective i [m], we sample the coefﬁcients θi uniformly from the positive part of the unit ball. We perform 10 trials up to round T = 3000 and report average performance of the algorithms.