Online Stochastic Linear Optimization under One-bit Feedback

Authors: Lijun Zhang, Tianbao Yang, Rong Jin, Yichi Xiao, Zhi-hua Zhou

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experimental results to demonstrate the effectiveness of the proposed algorithm.
Researcher Affiliation Collaboration Lijun Zhang ZHANGLJ@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China Tianbao Yang TIANBAO-YANG@UIOWA.EDU Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA Rong Jin JINRONG.JR@ALIBABA-INC.COM Alibaba Group, Seattle, USA Yichi Xiao XIAOYC@LAMDA.NJU.EDU.CN Zhi-Hua Zhou ZHOUZH@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Pseudocode Yes Algorithm 1 Online Learning for Logit Model (OL2M)
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code will be released.
Open Datasets No We sample a point uniformly at random from the (d 1)-sphere as w, and each time the learner submits an action xt, a one-bit feedback yt { 1} is generated according to the logit model in (3). [...] The decision set D Rd is constructed by sampling 10d points uniformly at random from the (d 1)-sphere.
Dataset Splits No The paper describes an online learning setting where data is revealed sequentially. It does not explicitly define or provide details for traditional training, validation, and test dataset splits.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions the use of 'CVX package' but does not specify a version number for it, nor does it list other software dependencies with specific versions.
Experiment Setup Yes To apply our algorithm, we need to determine the values of two parameters: λ and γt. λ is introduced to make Zt invertible, and the performance of our algorithm is insensitive to its value. Thus, we simply choose λ = 1 in the following. γt is an essential parameter which is the width of the confidence region, and its value is tuned as c log det(Zt)/det(Z1) according to (12), where c is searched in the range of [1e 3, 1].