Robust Knowledge Transfer in Tiered Reinforcement Learning

Authors: Jiawei Huang, Niao He

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct experiments in toy examples to verify our theoretical results.
Researcher Affiliation Academia Jiawei Huang Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch Niao He Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch
Pseudocode Yes Algorithm 1: Robust Tiered MAB, Algorithm 2: Robust Tiered RL, Algorithm 3: Robust Tiered MAB with Multiple Source Tasks
Open Source Code Yes Code is available at https://github.com/jiaweihhuang/Robust-Tiered-RL
Open Datasets No The paper describes the construction of 'toy examples' and parameters for the simulated environment but does not specify a publicly available or open dataset by name, link, or citation that can be accessed for training or other purposes.
Dataset Splits No The paper defines parameters for its simulated environment (e.g., S=3, A=3, H=5) and mentions evaluation, but it does not specify any train/validation/test dataset splits, as it primarily uses simulated 'toy examples' rather than a pre-existing dataset that would require such splits.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions adapting 'Strong Euler in [23] as online learning algorithm' and using 'the bonus function in [23]', but it does not list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We set S = A = 3 and H = 5. The details for construction of source and target tasks are defered to Appx. G. We adapt Strong Euler in [23] as online learning algorithm to solve source tasks, and use the bonus function in [23] as the bonus function in our Alg. 7. ... We choose λ = 0.3 1/S in Alg. 7, and in the MDP instance we test, ... We choose iteration number K = 1e7, where we start the transfer since k = 5e5 to avoid large burn-in terms.