Provable General Function Class Representation Learning in Multitask Bandits and MDP

Authors: Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation. Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation. Finally, we conduct experiments to verify our theoretical result. We design a neural network based bandit environment and implement the GFUCB algorithm. Experimental results corroborate the effect of multitask representation learning in boosting sample efficiency in non-linear bandits.
Researcher Affiliation Academia Rui Lu1, Andrew Zhao1, Simon S. Du2, Gao Huang1 1Department of Automation, BNRist, Tsinghua University 2Paul G. Allen School of Computer Science and Engineering, University of Washington {r-lu21,zqc21}@mails.tsinghua.edu.cn ssdu@cs.washington.com, gaohuang@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Generalized Functional UCB Algorithm; Algorithm 2 multitask Linear MDP Algorithm
Open Source Code No The paper does not provide any specific link or statement about open-source code availability for the methodology described.
Open Datasets Yes To test the efficacy of our algorithm, we use the MNIST dataset [10] to build a bandit problem... [10] Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141 142, 2012.
Dataset Splits No The paper mentions using the MNIST dataset but does not specify the train/validation/test splits used for the experiments. It describes the task design and how contexts are presented, but not the data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions using "Adam" and "SGD" optimizers and a "simple CNN" but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes For finding the empirically best ˆft, we use Adam with lr = 1e 3 to train for sufficiently long steps; in our setting, it is set to be 200 epochs at every step t, to ensure that the training loss is sufficiently low. ...we set it to be λ = 30 by empirical search. Also Bt = a log(b t + c) is an approximation for βt since βt includes N(Φ, α) which is intractable to be exactly computed, we found (a, b, c) = (0.4, 0.5, 2) to be a good parameter of UCB in single task. We use SGD with a small learning rate (5e 4) to finetune the model ˆft for 200 iterations to optimize ℓ(f).