Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation
Authors: Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. and 4. Experiments |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University 2Center for Statistical Science, Peking University 3School of Data Science, Fudan University 4Beijing Academy of Artificial Intelligence. |
| Pseudocode | Yes | The detailed steps of COREP are outlined in Algorithm C.1. |
| Open Source Code | Yes | The code is available at https://github.com/PKURL/COREP. |
| Open Datasets | Yes | The experiments are conducted on various environments from the Deep Mind Control Suite (Tassa et al., 2018), which is a widely used benchmark for RL algorithms. |
| Dataset Splits | No | The paper mentions collecting trajectories and updating replay buffers but does not explicitly state specific train, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | CPU Intel I9-12900K@3.2GHz (24 Cores) GPU Nvidia RTX 3090 (24GB) 2 RAM 256GB |
| Software Dependencies | No | The paper lists software libraries used such as 'Numpy', 'Py Torch', 'Py Torch Geometric', 'Deep Mind Control', and 'Open AI Gym' but does not specify their version numbers. |
| Experiment Setup | Yes | We list the hyperparameters for MLP, GAT, and VAE structures in Table C.1, and the hyperparameters for policy optimization and training in Table C.2. |