reproducibilityindex.ai

Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching

Authors: Yecheng Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art. Project website: https://sites.google.com/view/smodice/home Through extensive experiments, we show that SMODICE is effective for all three problem settings we consider and outperforms all state-of-art methods in each respective setting.
Researcher Affiliation	Academia	1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA 2University of Melbourne, Melbourne, Australia.
Pseudocode	Yes	Algorithm 1 SMODICE, Algorithm 2 SMODICE with χ2-divergence for Tabular MDPs, Algorithm 3 SMODICE for Continuous MDPs
Open Source Code	Yes	Project website: https://sites.google.com/view/smodice/home Code is available at: https://github.com/Jason Ma2016/SMODICE
Open Datasets	Yes	We utilize the D4RL (Fu et al., 2021) offline RL dataset. For Hopper, Walker2d, Half Cheetah, Ant, and Ant Maze, we construct the offline datasets by combining a small amount of expert data and a large amount of low quality random data. For the first four tasks, we leverage the respective expert-v2 and random-v2 datasets in the D4RL benchmark.
Dataset Splits	No	The paper describes the composition of the offline datasets (e.g., 'mixture of small number of expert trajectories... and a large number of low-quality trajectories from the random-v2 dataset') and how expert data is used for demonstration and evaluation, but it does not specify explicit training/validation/test splits in terms of percentages or sample counts for model training or hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only refers to general environments like Mujoco and a Franka robot for tasks, not the computational hardware.
Software Dependencies	No	The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)', 'official Py Torch implementation', and 'Num Py (Harris et al., 2020)'. However, it does not provide specific version numbers for PyTorch or other key software libraries used in the experiments.
Experiment Setup	Yes	Table 1. SMODICE Hyperparameters. (This table lists specific values for Optimizer (Adam), Critic learning rate (3e-4), Discriminator learning rate (3e-4), Actor learning rate (3e-5), Mini-batch size (256), Discount factor (0.99), Actor Mean Clipping, Actor Log Std. Clipping, and architecture details like hidden dimensions, layers, and activation functions for Discriminator, Critic, and Actor networks).