Online Stochastic Linear Optimization under One-bit Feedback
Authors: Lijun Zhang, Tianbao Yang, Rong Jin, Yichi Xiao, Zhi-hua Zhou
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results to demonstrate the effectiveness of the proposed algorithm. |
| Researcher Affiliation | Collaboration | Lijun Zhang ZHANGLJ@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China Tianbao Yang TIANBAO-YANG@UIOWA.EDU Department of Computer Science, The University of Iowa, Iowa City, IA 52242, USA Rong Jin JINRONG.JR@ALIBABA-INC.COM Alibaba Group, Seattle, USA Yichi Xiao XIAOYC@LAMDA.NJU.EDU.CN Zhi-Hua Zhou ZHOUZH@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China |
| Pseudocode | Yes | Algorithm 1 Online Learning for Logit Model (OL2M) |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code will be released. |
| Open Datasets | No | We sample a point uniformly at random from the (d 1)-sphere as w, and each time the learner submits an action xt, a one-bit feedback yt { 1} is generated according to the logit model in (3). [...] The decision set D Rd is constructed by sampling 10d points uniformly at random from the (d 1)-sphere. |
| Dataset Splits | No | The paper describes an online learning setting where data is revealed sequentially. It does not explicitly define or provide details for traditional training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'CVX package' but does not specify a version number for it, nor does it list other software dependencies with specific versions. |
| Experiment Setup | Yes | To apply our algorithm, we need to determine the values of two parameters: λ and γt. λ is introduced to make Zt invertible, and the performance of our algorithm is insensitive to its value. Thus, we simply choose λ = 1 in the following. γt is an essential parameter which is the width of the confidence region, and its value is tuned as c log det(Zt)/det(Z1) according to (12), where c is searched in the range of [1e 3, 1]. |