Robust Policy Learning via Offline Skill Diffusion

Authors: Woo Kyung Kim, Minjong Yoo, Honguk Woo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, we show that Du Skill outperforms other skillbased imitation learning and RL algorithms for several longhorizon tasks, demonstrating its benefits in few-shot imitation and online RL.
Researcher Affiliation Academia Woo Kyung Kim, Minjong Yoo, Honguk Woo* Department of Computer Science and Engineering, Sungkyunkwan University {kwk2696, mjyoo2, hwoo}@skku.edu
Pseudocode Yes Algorithm 1 lists the learning procedures of Du Skill. Algorithm 1: Offline Skill Diffusion Input: Trainig Datasets D, total denoise step K, guidance weight δ, hyperparameters βρ, βσ 1: Initialize encoders qρ, qσ, priors pρ, pσ, decoders ϵρ, ϵσ 2: while not converge do 3: Sample a batch {(s, a, ω)}i D 4: Update qρ and qσ using LDHVAE in (6) 5: Update pρ and pσ using Lprior in (8) 6: Update ϵρ and ϵσ using Lrec in (10) 7: end while 8: return qρ, qσ, pρ, pσ, ϵρ, ϵσ
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets Yes For evaluation, we use the multi-stage Meta World, which is implemented based on the Meta-World simulated benchmark (Yu et al. 2019).
Dataset Splits No The paper describes the collection of training and few-shot imitation datasets but does not specify explicit training/validation/test dataset splits, percentages, or absolute sample counts for each split.
Hardware Specification No The paper does not mention any specific hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used in the implementation.
Experiment Setup No The paper describes the overall framework and training objectives but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other configurations necessary for reproduction.