Lipschitz Bandits with Batched Feedback

Authors: Yasong Feng, zengfeng Huang, Tianyu Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical studies of A-BLi N. In the experiments, we use the arm space A = [0, 1]2 and the expected reward function µ(x) = 1 1 2 x x1 2 3 10 x x2 2, where x1 = (0.8, 0.7) and x2 = (0.1, 0.1). The landscape of µ and the resulting partition is shown in Figure 2(a).
Researcher Affiliation Academia Yasong Feng Shanghai Center for Mathematical Sciences Fudan University ysfeng20@fudan.edu.cnZengfeng Huang School of Data Science Fudan University huangzf@fudan.edu.cnTianyu Wang Shanghai Center for Mathematical Sciences Fudan University wangtianyu@fudan.edu.cn
Pseudocode Yes Algorithm 1 Batched Lipschitz Narrowing (BLi N)
Open Source Code Yes Our code is available at https://github.com/Feng Yasong-fifol/Batched-Lipschitz-Narrowing.
Open Datasets No The paper defines a synthetic 'arm space A = [0, 1]2 and the expected reward function µ(x)' for its experiments, rather than using a publicly available dataset that would require access information. Therefore, no access information for a public dataset is provided or applicable.
Dataset Splits No The paper describes a simulated bandit problem rather than experiments on a dataset with traditional train/validation/test splits. Therefore, no specific dataset split information is provided.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It mentions theoretical time complexity but no actual hardware used.
Software Dependencies No The paper states 'Our code is available at https://github.com/Feng Yasong-fifol/Batched-Lipschitz-Narrowing', and mentions in the ethics checklist that training details were specified, but it does not list specific software dependencies with version numbers within the main text or any directly quoted section.
Experiment Setup Yes In the experiments, we use the arm space A = [0, 1]2 and the expected reward function µ(x) = 1 1 2 x x1 2 3 10 x x2 2, where x1 = (0.8, 0.7) and x2 = (0.1, 0.1). ... We let the time horizon T = 80000... For this experiment, r1 = 1 8, r3 = 1 16, r4 = 1 32, which is the ACE sequence (rounded as in Remark 2) for d = 2 and dz = 0.