reproducibilityindex.ai

Pausing Policy Learning in Non-stationary Reinforcement Learning

Authors: Hyunin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluations on three different environments also reveal that a nonzero policy hold duration yields higher rewards compared to continuous decision updates.
Researcher Affiliation	Academia	1University of California, Berkeley 2Virginia Tech.
Pseudocode	Yes	Algorithm 1 Forecasting Online Reinforcement Learning
Open Source Code	No	The paper lists existing official codebases used for comparison (e.g., 'Official codes distributed from https://github.com/pranz24/pytorch-soft-actor-critic') but does not state that the authors are providing open-source code for their own proposed methodology (FSAC).
Open Datasets	No	The paper describes custom environments like the 'switching goal cliffworld' and modified 'Mujoco environments'. While Mujoco is a well-known simulator, the specific non-stationary modifications (vd(t) = a sin(wt)) mean the exact dataset/environment setup is not publicly linked or formally cited in a way that allows direct access to the specific data used without custom implementation.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, and test sets.
Hardware Specification	Yes	All experiments are conducted on 12 Intel Xeon CPU E5-2690 v4 and 2 Tesla V100 GPUs.
Software Dependencies	No	The paper lists software libraries used ('Pytorch', 'Open AI Gym', 'Numpy') but does not provide specific version numbers for these dependencies, which is required for reproducibility.
Experiment Setup	Yes	For our experiments, we varied hyperparameters such as learning rates λπ {0.0001, 0.0003, 0.0005, 0.0007}, soft update parameters τs {0.001, 0.005, 0.003} and the entropy regularization parameters {0.01, 0.03, 0.1} and also experimented with different prediction lengths lf {5, 15, 20}.