Robust Knowledge Transfer in Tiered Reinforcement Learning
Authors: Jiawei Huang, Niao He
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct experiments in toy examples to verify our theoretical results. |
| Researcher Affiliation | Academia | Jiawei Huang Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch Niao He Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1: Robust Tiered MAB, Algorithm 2: Robust Tiered RL, Algorithm 3: Robust Tiered MAB with Multiple Source Tasks |
| Open Source Code | Yes | Code is available at https://github.com/jiaweihhuang/Robust-Tiered-RL |
| Open Datasets | No | The paper describes the construction of 'toy examples' and parameters for the simulated environment but does not specify a publicly available or open dataset by name, link, or citation that can be accessed for training or other purposes. |
| Dataset Splits | No | The paper defines parameters for its simulated environment (e.g., S=3, A=3, H=5) and mentions evaluation, but it does not specify any train/validation/test dataset splits, as it primarily uses simulated 'toy examples' rather than a pre-existing dataset that would require such splits. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions adapting 'Strong Euler in [23] as online learning algorithm' and using 'the bonus function in [23]', but it does not list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We set S = A = 3 and H = 5. The details for construction of source and target tasks are defered to Appx. G. We adapt Strong Euler in [23] as online learning algorithm to solve source tasks, and use the bonus function in [23] as the bonus function in our Alg. 7. ... We choose λ = 0.3 1/S in Alg. 7, and in the MDP instance we test, ... We choose iteration number K = 1e7, where we start the transfer since k = 5e5 to avoid large burn-in terms. |