Continuous Mean-Covariance Bandits

Authors: Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results also demonstrate the superiority of our algorithms. In this section, we present experimental results for our algorithms on both synthetic and real-world [20] datasets.
Researcher Affiliation Academia Yihan Du IIIS, Tsinghua University Beijing, China duyh18@mails.tsinghua.edu.cn Siwei Wang CST, Tsinghua University Beijing, China wangsw2020@mail.tsinghua.edu.cn Zhixuan Fang IIIS, Tsinghua University, Beijing, China Shanghai Qi Zhi Institute, Shanghai, China zfang@mail.tsinghua.edu.cn Longbo Huang IIIS, Tsinghua University Beijing, China longbohuang@mail.tsinghua.edu.cn
Pseudocode Yes Algorithm 1 MC-Empirical; Algorithm 2 MC-UCB; Algorithm 3 MC-ETE
Open Source Code No The paper does not provide a link to open-source code or explicitly state that the code for their method is publicly available.
Open Datasets Yes For the real-world dataset, we use an open dataset US Funds from Yahoo Finance on Kaggle [20], which provides financial data of 1680 ETF funds in 2010-2017. [20] Stefano Leone. Dataset: US funds dataset from yahoo finance. Kaggle, 2020. https: //www.kaggle.com/stefanoleone992/mutual-funds-and-etfs?select=ETFs.csv.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. For bandit problems, data is generated through interaction, and traditional static dataset splits are not typically defined.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For the synthetic dataset, we set θ = [0.2, 0.3, 0.2, 0.2, 0.2] , and Σ has all diagonal entries equal to 1 and all off-diagonal entries equal to 0.05. For both datasets, we set d = 5 and ρ {0.1, 10}. The random reward θt is drawn i.i.d. from Gaussian distribution N(θ , Σ ). We perform 50 independent runs for each algorithm and show the average regret and 95% confidence interval across runs.