Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Lipschitz Bandits in Optimal Space

Authors: Xiaoyi Zhu, Zengfeng Huang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct numerical simulations, and the results show that our new algorithm achieves regret comparable to the state-of-the-art while reducing memory usage by orders of magnitude. In this section, we evaluate the Log-Li algorithm.
Researcher Affiliation Academia Xiaoyi Zhu School of Data Science Fudan University Shanghai, China EMAIL; Zengfeng Huang School of Data Science Fudan University Shanghai, China EMAIL
Pseudocode Yes The learning process is summarized in Algorithm 1 and 2. Algorithm 1: Logarithmic Space Lipschitz for Each round (Round Func) Input: Time horizon T; current time t; maximum depth m; current depth h; current cube C; comparison arm reward µm 1; current round max reward µm. Algorithm 2: Logarithmic Space Lipschitz(Log-Li) Input: Arm set A = [0, 1]d; time horizon T.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No In the experiment, the time horizon T = 100, 000, the arm space is [0, 1]2 and the expect reward function is µ(x) = 1 x x1 2 0.5 x x2 2 for different values of x1 and x2. The paper uses a synthetic reward function for its experiments rather than an external, publicly available dataset.
Dataset Splits No The paper uses a synthetic reward function for its experiments rather than an external dataset, so no dataset splits (training/test/validation) are applicable or mentioned.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of the algorithms or experiments.
Experiment Setup Yes In the experiment, the time horizon T = 100, 000, the arm space is [0, 1]2 and the expect reward function is µ(x) = 1 x x1 2 0.5 x x2 2 for different values of x1 and x2.