reproducibilityindex.ai

Robust Policy Learning via Offline Skill Diffusion

Authors: Woo Kyung Kim, Minjong Yoo, Honguk Woo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we show that Du Skill outperforms other skillbased imitation learning and RL algorithms for several longhorizon tasks, demonstrating its benefits in few-shot imitation and online RL.
Researcher Affiliation	Academia	Woo Kyung Kim, Minjong Yoo, Honguk Woo* Department of Computer Science and Engineering, Sungkyunkwan University {kwk2696, mjyoo2, hwoo}@skku.edu
Pseudocode	Yes	Algorithm 1 lists the learning procedures of Du Skill. Algorithm 1: Offline Skill Diffusion Input: Trainig Datasets D, total denoise step K, guidance weight δ, hyperparameters βρ, βσ 1: Initialize encoders qρ, qσ, priors pρ, pσ, decoders ϵρ, ϵσ 2: while not converge do 3: Sample a batch {(s, a, ω)}i D 4: Update qρ and qσ using LDHVAE in (6) 5: Update pρ and pσ using Lprior in (8) 6: Update ϵρ and ϵσ using Lrec in (10) 7: end while 8: return qρ, qσ, pρ, pσ, ϵρ, ϵσ
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets	Yes	For evaluation, we use the multi-stage Meta World, which is implemented based on the Meta-World simulated benchmark (Yu et al. 2019).
Dataset Splits	No	The paper describes the collection of training and few-shot imitation datasets but does not specify explicit training/validation/test dataset splits, percentages, or absolute sample counts for each split.
Hardware Specification	No	The paper does not mention any specific hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used in the implementation.
Experiment Setup	No	The paper describes the overall framework and training objectives but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other configurations necessary for reproduction.