reproducibilityindex.ai

Provably Efficient Lifelong Reinforcement Learning with Linear Representation

Authors: Sanae Amani, Lin Yang, Ching-An Cheng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented our main algorithm UCBlvd on synthetic environments and compared its performance with the warm-up algorithm Lifelong-LSVI, which is viewed as an idealized baseline ignoring the computational complexity. In all the experiments, the same setting, task sequences and feature mappings were used for both UCBlvd and Lifelong-LSVI. Figure 1a depicts per-episode rewards for the main setup considered throughout the paper, and Figure 1b shows those for the setup in Remark 2.
Researcher Affiliation	Collaboration	Sanae Amani University of California, Los Angeles samani@ucla.edu Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu Ching-An Cheng Microsoft Research, Redmond chinganc@microsoft.com
Pseudocode	Yes	Algorithm 1: Lifelong-LSVI; Algorithm 2: UCBlvd (UCB Lifelong Value Distillation); Algorithm 3: UCBlvd with Unknown Rewards; Algorithm 4: Modiﬁed UCBlvd; Algorithm 5: Standard Lifelong-LSVI with Computation Sharing
Open Source Code	No	The paper does not provide a direct link or an explicit statement about the availability of open-source code for the described methodology.
Open Datasets	No	The paper mentions using 'synthetic environments' and 'parameters drawn from N(0, Id )' which indicates data generation rather than the use of a publicly available dataset with access information. No specific training data split information or access details are provided.
Dataset Splits	No	The paper mentions using 'synthetic environments' but does not provide specific dataset split information (e.g., percentages, counts, or references to predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper describes experimental setup parameters but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific solver names). It only implies the use of software for simulation.
Experiment Setup	Yes	In all the experiments, we have chosen δ = 0.01, λ = 1, d = 5, and H = 5. The parameters {ηh}h [H] are drawn from N(0, Id ). ... For the results shown in Figure 2a, the mappings ρ(w) are drawn from N(0, Im) except for the n = m representative tasks {w(j)}j [m] introduced in Assumption 3, for which we set ρ(w(j)) = ej for j [m].