SafeDICE: Offline Safe Imitation Learning with Non-Preferred Demonstrations

Authors: Youngsoo Jang, Geon-Hyeong Kim, Jongmin Lee, Sungryull Sohn, Byoungjip Kim, Honglak Lee, Moontae Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we show the experimental results on various tasks from constrained RL benchmarks. First, we conduct the evaluation on domains from Real-World RL (RWRL) suite [8], which provides challenges in real-world scenarios including safety constraints. Second, we also conduct the experiment on continuous control tasks from Safety Gym environment [24], which provides practical scenarios of safety issues.
Researcher Affiliation Collaboration Youngsoo Jang1, Geon-Hyeong Kim1, Jongmin Lee2, Sungryull Sohn1, Byoungjip Kim1, Honglak Lee1, Moontae Lee1,3 1 LG AI Research 2 University of California, Berkeley 3 University of Illinois Chicago
Pseudocode Yes The pseudocode for the whole process of Safe DICE can be found in Appendix B.
Open Source Code Yes Our code is available on https://github.com/jys5609/Safe DICE.
Open Datasets No Since there is no standard dataset for offline IL considering the safety constraints, we collected data by training the online RL agents. ... Then, we generated scarce but labeled non-preferred demonstrations DN from the non-preferred policy, and abundant but unlabeled demonstrations DU from both preferred and non-preferred policies. The paper describes how they generated their own datasets, but does not provide concrete access information (link, DOI, citation) for these collected datasets.
Dataset Splits No The paper describes how demonstrations were generated and used (e.g., 'scarce but labeled non-preferred demonstrations DN', 'abundant but unlabeled demonstrations DU') but does not provide specific training/validation/test dataset splits, percentages, or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No We implement Safe DICE based on the Real-World RL (RWRL) suite [8] and codebase of Demo DICE [13], which is one of recent DICE-based offline IL algorithms. The paper mentions software frameworks and libraries but does not provide specific version numbers for them (e.g., Python, PyTorch/TensorFlow, or specific library versions).
Experiment Setup Yes Table 2: Configurations of hyperparameters used in our experimental results on RWRL environment. Table 4: Configurations of hyperparameters used in our experimental results on Safety Gym environment. These tables provide specific values for hyperparameters such as discount factor, learning rate, network size, batch size, and number of training iterations.