Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

Authors: Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results further demonstrate the superior performance of COREP over existing methods in tackling non-stationarity problems. and 4. Experiments
Researcher Affiliation Academia 1School of Computer Science, Peking University 2Center for Statistical Science, Peking University 3School of Data Science, Fudan University 4Beijing Academy of Artificial Intelligence.
Pseudocode Yes The detailed steps of COREP are outlined in Algorithm C.1.
Open Source Code Yes The code is available at https://github.com/PKURL/COREP.
Open Datasets Yes The experiments are conducted on various environments from the Deep Mind Control Suite (Tassa et al., 2018), which is a widely used benchmark for RL algorithms.
Dataset Splits No The paper mentions collecting trajectories and updating replay buffers but does not explicitly state specific train, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes CPU Intel I9-12900K@3.2GHz (24 Cores) GPU Nvidia RTX 3090 (24GB) 2 RAM 256GB
Software Dependencies No The paper lists software libraries used such as 'Numpy', 'Py Torch', 'Py Torch Geometric', 'Deep Mind Control', and 'Open AI Gym' but does not specify their version numbers.
Experiment Setup Yes We list the hyperparameters for MLP, GAT, and VAE structures in Table C.1, and the hyperparameters for policy optimization and training in Table C.2.