Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Periodic Skill Discovery

Authors: Jonghae Park, Daesol Cho, Jusuk Lee, Dongseok Shim, Inkyu Jang, H. Jin Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The main goal of our experiments is to demonstrate that PSD can discover diverse periodic skills across multiple timescales by learning a circular latent representation. We also evaluate whether the discovered skills are useful for solving downstream tasks. In addition, we examine the scalability of PSD to high-dimensional observations such as pixel inputs. Finally, we explore the potential of combining PSD with existing unsupervised skill discovery methods to enhance the agent s behavioral diversity. ... Table 1: Comparison of downstream task performance. We evaluate PSD against existing skill discovery methods. High-level policies are trained using PPO with the skill policies kept frozen. All reported values are average returns over 10 seeds.
Researcher Affiliation	Academia	Jonghae Park1 Daesol Cho2 Jusuk Lee1 Dongseok Shim1 Inkyu Jang1 H. Jin Kim1 1Seoul National University 2Georgia Institute of Technology EMAIL
Pseudocode	Yes	Algorithm 1 Periodic Skill Discovery (PSD) ... Algorithm 2 Adaptive Sampling Method ... Algorithm 3 PSD combined with METRA
Open Source Code	Yes	Our code and demos are available at https://jonghaepark.github.io/psd
Open Datasets	Yes	Experimental Setup We evaluate PSD on five robotic locomotion tasks in the Mu Jo Co environment [10, 91], both in state and pixel domain: Ant, Half Cheetah, Humanoid, Hopper, and Walker2D (Figures 4 and 7).
Dataset Splits	No	Experimental Setup We evaluate PSD on five robotic locomotion tasks in the Mu Jo Co environment [10, 91], both in state and pixel domain... Episode lengths are set to 200 timesteps for Ant and Half Cheetah, and 400 timesteps for Humanoid, Hopper, and Walker2D. ... All reported values are average returns over 10 seeds.
Hardware Specification	Yes	All experiments are conducted on an NVIDIA A6000 GPU, and training for each task typically completes within 24 hours.
Software Dependencies	No	We implement PSD on top of the publicly available PyTorch SAC implementation1. For fair comparison, we implement all baseline methods within the same codebase as PSD to ensure consistency in training procedures and infrastructure. To train the high-level policy for downstream tasks, we use PPO implemented in a public PyTorch repository2. All experiments are conducted on an NVIDIA A6000 GPU, and training for each task typically completes within 24 hours. Footnote 1: https://github.com/pranz24/pytorch-soft-actor-critic Footnote 2: https://github.com/nikhilbarhate99/PPO-PyTorch
Experiment Setup	Yes	Appendix C.2 Implementation Details ... The full set of hyperparameters is summarized in Table 2. Table 2: Hyperparameters for training PSD. Parameter Value Learning rate 1 10 4 Discount factor γ 0.99 Optimizer Adam [48] N of episodes per epoch 8 N of gradient steps per epoch 64 Replay buffer size 5 105 Minibatch size 1024 (ϕL), 256 (others) Target smoothing coefficient 0.995 Entropy coefficient Auto-tuned Circular latent dimension d {3, 6} Output dimension of the positional encoding D 8 r PSD κ 10 JPSD ϵ 10 5 JPSD k 0.5 JPSD λ1 5 (Ant, Half Cheetah), 10 (Humanoid, Hopper, Walker2D) JPSD λ2 5 (Ant, Half Cheetah), 10 (Humanoid, Hopper, Walker2D) N of hidden layers 2 N of hidden units per layer 1024 Step size of adaptive sampling N 1 Adaptive sampling interval 2000 episodes N of evaluation episodes for adaptive sampling 5 Thresholds (α, β) for adaptive sampling (0.9, 0.4)