Pausing Policy Learning in Non-stationary Reinforcement Learning
Authors: Hyunin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations on three different environments also reveal that a nonzero policy hold duration yields higher rewards compared to continuous decision updates. |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2Virginia Tech. |
| Pseudocode | Yes | Algorithm 1 Forecasting Online Reinforcement Learning |
| Open Source Code | No | The paper lists existing official codebases used for comparison (e.g., 'Official codes distributed from https://github.com/pranz24/pytorch-soft-actor-critic') but does not state that the authors are providing open-source code for their own proposed methodology (FSAC). |
| Open Datasets | No | The paper describes custom environments like the 'switching goal cliffworld' and modified 'Mujoco environments'. While Mujoco is a well-known simulator, the specific non-stationary modifications (vd(t) = a sin(wt)) mean the exact dataset/environment setup is not publicly linked or formally cited in a way that allows direct access to the *specific* data used without custom implementation. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning into train, validation, and test sets. |
| Hardware Specification | Yes | All experiments are conducted on 12 Intel Xeon CPU E5-2690 v4 and 2 Tesla V100 GPUs. |
| Software Dependencies | No | The paper lists software libraries used ('Pytorch', 'Open AI Gym', 'Numpy') but does not provide specific version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For our experiments, we varied hyperparameters such as learning rates λπ {0.0001, 0.0003, 0.0005, 0.0007}, soft update parameters τs {0.001, 0.005, 0.003} and the entropy regularization parameters {0.01, 0.03, 0.1} and also experimented with different prediction lengths lf {5, 15, 20}. |