Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching
Authors: Yecheng Ma, Andrew Shen, Dinesh Jayaraman, Osbert Bastani
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate SMODICE on both gridworld environments as well as on high-dimensional offline benchmarks. Our results demonstrate that SMODICE is effective for all three problem settings and significantly outperforms prior state-of-art. Project website: https://sites.google.com/view/smodice/home Through extensive experiments, we show that SMODICE is effective for all three problem settings we consider and outperforms all state-of-art methods in each respective setting. |
| Researcher Affiliation | Academia | 1Department of Computer and Information Science, University of Pennsylvania, Philadelphia, USA 2University of Melbourne, Melbourne, Australia. |
| Pseudocode | Yes | Algorithm 1 SMODICE, Algorithm 2 SMODICE with χ2-divergence for Tabular MDPs, Algorithm 3 SMODICE for Continuous MDPs |
| Open Source Code | Yes | Project website: https://sites.google.com/view/smodice/home Code is available at: https://github.com/Jason Ma2016/SMODICE |
| Open Datasets | Yes | We utilize the D4RL (Fu et al., 2021) offline RL dataset. For Hopper, Walker2d, Half Cheetah, Ant, and Ant Maze, we construct the offline datasets by combining a small amount of expert data and a large amount of low quality random data. For the first four tasks, we leverage the respective expert-v2 and random-v2 datasets in the D4RL benchmark. |
| Dataset Splits | No | The paper describes the composition of the offline datasets (e.g., 'mixture of small number of expert trajectories... and a large number of low-quality trajectories from the random-v2 dataset') and how expert data is used for demonstration and evaluation, but it does not specify explicit training/validation/test splits in terms of percentages or sample counts for model training or hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only refers to general environments like Mujoco and a Franka robot for tasks, not the computational hardware. |
| Software Dependencies | No | The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)', 'official Py Torch implementation', and 'Num Py (Harris et al., 2020)'. However, it does not provide specific version numbers for PyTorch or other key software libraries used in the experiments. |
| Experiment Setup | Yes | Table 1. SMODICE Hyperparameters. (This table lists specific values for Optimizer (Adam), Critic learning rate (3e-4), Discriminator learning rate (3e-4), Actor learning rate (3e-5), Mini-batch size (256), Discount factor (0.99), Actor Mean Clipping, Actor Log Std. Clipping, and architecture details like hidden dimensions, layers, and activation functions for Discriminator, Critic, and Actor networks). |