Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provably Efficient Lifelong Reinforcement Learning with Linear Representation
Authors: Sanae Amani, Lin Yang, Ching-An Cheng
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented our main algorithm UCBlvd on synthetic environments and compared its performance with the warm-up algorithm Lifelong-LSVI, which is viewed as an idealized baseline ignoring the computational complexity. In all the experiments, the same setting, task sequences and feature mappings were used for both UCBlvd and Lifelong-LSVI. Figure 1a depicts per-episode rewards for the main setup considered throughout the paper, and Figure 1b shows those for the setup in Remark 2. |
| Researcher Affiliation | Collaboration | Sanae Amani University of California, Los Angeles EMAIL Lin F. Yang University of California, Los Angeles EMAIL Ching-An Cheng Microsoft Research, Redmond EMAIL |
| Pseudocode | Yes | Algorithm 1: Lifelong-LSVI; Algorithm 2: UCBlvd (UCB Lifelong Value Distillation); Algorithm 3: UCBlvd with Unknown Rewards; Algorithm 4: Modified UCBlvd; Algorithm 5: Standard Lifelong-LSVI with Computation Sharing |
| Open Source Code | No | The paper does not provide a direct link or an explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper mentions using 'synthetic environments' and 'parameters drawn from N(0, Id )' which indicates data generation rather than the use of a publicly available dataset with access information. No specific training data split information or access details are provided. |
| Dataset Splits | No | The paper mentions using 'synthetic environments' but does not provide specific dataset split information (e.g., percentages, counts, or references to predefined splits) for training, validation, or testing. |
| Hardware Specification | No | The paper describes experimental setup parameters but does not specify the hardware (e.g., CPU, GPU models, or cloud resources) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or specific solver names). It only implies the use of software for simulation. |
| Experiment Setup | Yes | In all the experiments, we have chosen δ = 0.01, λ = 1, d = 5, and H = 5. The parameters {ηh}h [H] are drawn from N(0, Id ). ... For the results shown in Figure 2a, the mappings ρ(w) are drawn from N(0, Im) except for the n = m representative tasks {w(j)}j [m] introduced in Assumption 3, for which we set ρ(w(j)) = ej for j [m]. |