Provable General Function Class Representation Learning in Multitask Bandits and MDP
Authors: Rui Lu, Andrew Zhao, Simon S. Du, Gao Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation. Lastly, we conduct experiments to demonstrate the effectiveness of our algorithm with neural net representation. Finally, we conduct experiments to verify our theoretical result. We design a neural network based bandit environment and implement the GFUCB algorithm. Experimental results corroborate the effect of multitask representation learning in boosting sample efficiency in non-linear bandits. |
| Researcher Affiliation | Academia | Rui Lu1, Andrew Zhao1, Simon S. Du2, Gao Huang1 1Department of Automation, BNRist, Tsinghua University 2Paul G. Allen School of Computer Science and Engineering, University of Washington {r-lu21,zqc21}@mails.tsinghua.edu.cn ssdu@cs.washington.com, gaohuang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Generalized Functional UCB Algorithm; Algorithm 2 multitask Linear MDP Algorithm |
| Open Source Code | No | The paper does not provide any specific link or statement about open-source code availability for the methodology described. |
| Open Datasets | Yes | To test the efficacy of our algorithm, we use the MNIST dataset [10] to build a bandit problem... [10] Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141 142, 2012. |
| Dataset Splits | No | The paper mentions using the MNIST dataset but does not specify the train/validation/test splits used for the experiments. It describes the task design and how contexts are presented, but not the data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "Adam" and "SGD" optimizers and a "simple CNN" but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | For finding the empirically best ˆft, we use Adam with lr = 1e 3 to train for sufficiently long steps; in our setting, it is set to be 200 epochs at every step t, to ensure that the training loss is sufficiently low. ...we set it to be λ = 30 by empirical search. Also Bt = a log(b t + c) is an approximation for βt since βt includes N(Φ, α) which is intractable to be exactly computed, we found (a, b, c) = (0.4, 0.5, 2) to be a good parameter of UCB in single task. We use SGD with a small learning rate (5e 4) to finetune the model ˆft for 200 iterations to optimize ℓ(f). |