Robust Policy Learning via Offline Skill Diffusion
Authors: Woo Kyung Kim, Minjong Yoo, Honguk Woo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we show that Du Skill outperforms other skillbased imitation learning and RL algorithms for several longhorizon tasks, demonstrating its benefits in few-shot imitation and online RL. |
| Researcher Affiliation | Academia | Woo Kyung Kim, Minjong Yoo, Honguk Woo* Department of Computer Science and Engineering, Sungkyunkwan University {kwk2696, mjyoo2, hwoo}@skku.edu |
| Pseudocode | Yes | Algorithm 1 lists the learning procedures of Du Skill. Algorithm 1: Offline Skill Diffusion Input: Trainig Datasets D, total denoise step K, guidance weight δ, hyperparameters βρ, βσ 1: Initialize encoders qρ, qσ, priors pρ, pσ, decoders ϵρ, ϵσ 2: while not converge do 3: Sample a batch {(s, a, ω)}i D 4: Update qρ and qσ using LDHVAE in (6) 5: Update pρ and pσ using Lprior in (8) 6: Update ϵρ and ϵσ using Lrec in (10) 7: end while 8: return qρ, qσ, pρ, pσ, ϵρ, ϵσ |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | For evaluation, we use the multi-stage Meta World, which is implemented based on the Meta-World simulated benchmark (Yu et al. 2019). |
| Dataset Splits | No | The paper describes the collection of training and few-shot imitation datasets but does not specify explicit training/validation/test dataset splits, percentages, or absolute sample counts for each split. |
| Hardware Specification | No | The paper does not mention any specific hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | No | The paper describes the overall framework and training objectives but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or other configurations necessary for reproduction. |