reproducibilityindex.ai

Robust Knowledge Transfer in Tiered Reinforcement Learning

Authors: Jiawei Huang, Niao He

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct experiments in toy examples to verify our theoretical results.
Researcher Affiliation	Academia	Jiawei Huang Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch Niao He Department of Computer Science ETH Zurich jiawei.huang@inf.ethz.ch
Pseudocode	Yes	Algorithm 1: Robust Tiered MAB, Algorithm 2: Robust Tiered RL, Algorithm 3: Robust Tiered MAB with Multiple Source Tasks
Open Source Code	Yes	Code is available at https://github.com/jiaweihhuang/Robust-Tiered-RL
Open Datasets	No	The paper describes the construction of 'toy examples' and parameters for the simulated environment but does not specify a publicly available or open dataset by name, link, or citation that can be accessed for training or other purposes.
Dataset Splits	No	The paper defines parameters for its simulated environment (e.g., S=3, A=3, H=5) and mentions evaluation, but it does not specify any train/validation/test dataset splits, as it primarily uses simulated 'toy examples' rather than a pre-existing dataset that would require such splits.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions adapting 'Strong Euler in [23] as online learning algorithm' and using 'the bonus function in [23]', but it does not list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We set S = A = 3 and H = 5. The details for construction of source and target tasks are defered to Appx. G. We adapt Strong Euler in [23] as online learning algorithm to solve source tasks, and use the bonus function in [23] as the bonus function in our Alg. 7. ... We choose λ = 0.3 1/S in Alg. 7, and in the MDP instance we test, ... We choose iteration number K = 1e7, where we start the transfer since k = 5e5 to avoid large burn-in terms.