Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Pausing Policy Learning in Non-stationary Reinforcement Learning
Authors: Hyunin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations on three different environments also reveal that a nonzero policy hold duration yields higher rewards compared to continuous decision updates. |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2Virginia Tech. |
| Pseudocode | Yes | Algorithm 1 Forecasting Online Reinforcement Learning |
| Open Source Code | No | The paper lists existing official codebases used for comparison (e.g., 'Official codes distributed from https://github.com/pranz24/pytorch-soft-actor-critic') but does not state that the authors are providing open-source code for their own proposed methodology (FSAC). |
| Open Datasets | No | The paper describes custom environments like the 'switching goal cliffworld' and modified 'Mujoco environments'. While Mujoco is a well-known simulator, the specific non-stationary modifications (vd(t) = a sin(wt)) mean the exact dataset/environment setup is not publicly linked or formally cited in a way that allows direct access to the *specific* data used without custom implementation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, and test sets. |
| Hardware Specification | Yes | All experiments are conducted on 12 Intel Xeon CPU E5-2690 v4 and 2 Tesla V100 GPUs. |
| Software Dependencies | No | The paper lists software libraries used ('Pytorch', 'Open AI Gym', 'Numpy') but does not provide specific version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For our experiments, we varied hyperparameters such as learning rates λπ {0.0001, 0.0003, 0.0005, 0.0007}, soft update parameters τs {0.001, 0.005, 0.003} and the entropy regularization parameters {0.01, 0.03, 0.1} and also experimented with different prediction lengths lf {5, 15, 20}. |