reproducibilityindex.ai

Offline Imitation Learning with Suboptimal Demonstrations via Relaxed Distribution Matching

Authors: Lantao Yu, Tianhe Yu, Jiaming Song, Willie Neiswanger, Stefano Ermon

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical study shows that our method significantly outperforms the best prior offline IL method in six standard continuous control environments with over 30% performance gain on average, across 22 settings where the imperfect dataset is highly suboptimal.
Researcher Affiliation	Collaboration	Lantao Yu1, Tianhe Yu1, Jiaming Song 2, Willie Neiswanger 1, Stefano Ermon 1 1 Computer Science Department, Stanford University 2 NVIDIA (Work done while at Stanford)
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available.
Open Datasets	Yes	We consider offline datasets of four Mu Jo Co (Todorov, Erez, and Tassa 2012) locomotion environments (hopper, halfcheetah, walker2d and ant) and two Adroit robotic manipulation environments (hammer and relocate) from the standard offline RL benchmark D4RL (Fu et al. 2020).
Dataset Splits	No	The paper describes the construction of datasets by mixing expert and random data from D4RL, but it does not specify explicit training, validation, and test splits (e.g., percentages or counts) for these combined datasets that would allow for reproduction of data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using multilayer perceptron (MLP) networks and refers to 'gradient penalty (Gulrajani et al. 2017)' but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	For all the tasks, we use α = 0.2 for Relax DICE and use α = 0.05 for Demo DICE as suggested in (Kim et al. 2021), which is also verified in our experiments. We pick α and β for Relax DICE-DRC via grid search, which we will discuss in the appendix. For more details of the experiment set-ups, evaluation protocols, hyperparameters and practical implementations, please see the appendix.