Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

Authors: Kai Yan, Alex Schwing, Yu-Xiong Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that PW-DICE improves upon several state-of-the-art methods.
Researcher Affiliation Academia 1The Grainger College of Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.
Pseudocode Yes Algorithm 1 PW-DICE
Open Source Code Yes The code is available at https: //github.com/Kai Yan289/PW-DICE.
Open Datasets Yes SMODICE uses a single trajectory (1000 states) from the expert-v2 dataset in D4RL (Fu et al., 2020b) as the expert dataset E.
Dataset Splits No The paper describes dataset usage for training and testing, and mentions batch sizes and training lengths, but does not provide explicit details on train/validation/test dataset splits, such as percentages or specific sample counts for each split.
Hardware Specification Yes All experiments are carried out with a single NVIDIA RTX 2080Ti GPU on an Ubuntu 18.04 server with 72 Intel Xeon Gold 6254 CPUs @ 3.10GHz.
Software Dependencies No The paper mentions software like CVXPY, Gurobi, MOSEK, Open AI gym, and D4RL, but it does not specify version numbers for these or other key software components (e.g., Python, PyTorch) required to ensure reproducibility.
Experiment Setup Yes Tab. 1 summarizes our hyperparameters, which are also the hyperpameters of plain Behavior Cloning if applicable. For baselines (SMODICE, Lobs DICE, ORIL, OTR, and DWBC), we use the hyperparameters reported in their paper (unless the hyperparameter values in the paper and the code differ, in which case we report the values from the code).