Provably Efficient Lifelong Reinforcement Learning with Linear Representation
Authors: Sanae Amani, Lin Yang, Ching-An Cheng
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented our main algorithm UCBlvd on synthetic environments and compared its performance with the warm-up algorithm Lifelong-LSVI, which is viewed as an idealized baseline ignoring the computational complexity. In all the experiments, the same setting, task sequences and feature mappings were used for both UCBlvd and Lifelong-LSVI. Figure 1a depicts per-episode rewards for the main setup considered throughout the paper, and Figure 1b shows those for the setup in Remark 2. |
| Researcher Affiliation | Collaboration | Sanae Amani University of California, Los Angeles samani@ucla.edu Lin F. Yang University of California, Los Angeles linyang@ee.ucla.edu Ching-An Cheng Microsoft Research, Redmond chinganc@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Lifelong-LSVI; Algorithm 2: UCBlvd (UCB Lifelong Value Distillation); Algorithm 3: UCBlvd with Unknown Rewards; Algorithm 4: Modified UCBlvd; Algorithm 5: Standard Lifelong-LSVI with Computation Sharing |
| Open Source Code | No | The paper does not provide a direct link or an explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper mentions using 'synthetic environments' and 'parameters drawn from N(0, Id )' which indicates data generation rather than the use of a publicly available dataset with access information. No specific training data split information or access details are provided. |
| Dataset Splits | No | The paper mentions using 'synthetic environments' but does not provide specific dataset split information (e.g., percentages, counts, or references to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper describes experimental setup parameters but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific solver names). It only implies the use of software for simulation. |
| Experiment Setup | Yes | In all the experiments, we have chosen δ = 0.01, λ = 1, d = 5, and H = 5. The parameters {ηh}h [H] are drawn from N(0, Id ). ... For the results shown in Figure 2a, the mappings ρ(w) are drawn from N(0, Im) except for the n = m representative tasks {w(j)}j [m] introduced in Assumption 3, for which we set ρ(w(j)) = ej for j [m]. |