Provably Efficient Lifelong Reinforcement Learning with Linear Representation

Authors: Sanae Amani, Lin Yang, Ching-An Cheng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented our main algorithm UCBlvd on synthetic environments and compared its performance with the warm-up algorithm Lifelong-LSVI, which is viewed as an idealized baseline ignoring the computational complexity. In all the experiments, the same setting, task sequences and feature mappings were used for both UCBlvd and Lifelong-LSVI. Figure 1a depicts per-episode rewards for the main setup considered throughout the paper, and Figure 1b shows those for the setup in Remark 2.
Researcher Affiliation Collaboration Sanae Amani University of California, Los Angeles samani@ucla.edu Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu Ching-An Cheng Microsoft Research, Redmond chinganc@microsoft.com
Pseudocode Yes Algorithm 1: Lifelong-LSVI; Algorithm 2: UCBlvd (UCB Lifelong Value Distillation); Algorithm 3: UCBlvd with Unknown Rewards; Algorithm 4: Modified UCBlvd; Algorithm 5: Standard Lifelong-LSVI with Computation Sharing
Open Source Code No The paper does not provide a direct link or an explicit statement about the availability of open-source code for the described methodology.
Open Datasets No The paper mentions using 'synthetic environments' and 'parameters drawn from N(0, Id )' which indicates data generation rather than the use of a publicly available dataset with access information. No specific training data split information or access details are provided.
Dataset Splits No The paper mentions using 'synthetic environments' but does not provide specific dataset split information (e.g., percentages, counts, or references to predefined splits) for training, validation, or testing.
Hardware Specification No The paper describes experimental setup parameters but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific solver names). It only implies the use of software for simulation.
Experiment Setup Yes In all the experiments, we have chosen δ = 0.01, λ = 1, d = 5, and H = 5. The parameters {ηh}h [H] are drawn from N(0, Id ). ... For the results shown in Figure 2a, the mappings ρ(w) are drawn from N(0, Im) except for the n = m representative tasks {w(j)}j [m] introduced in Assumption 3, for which we set ρ(w(j)) = ej for j [m].