Distance Weighted Supervised Learning for Offline Interaction Data
Authors: Joey Hejna, Jensen Gao, Dorsa Sadigh
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across all datasets we test, DWSL empirically maintains behavior cloning as a lower bound while still exhibiting policy improvement. In high-dimensional image domains, DWSL surpasses the performance of both prior goal-conditioned IL and RL algorithms. Visualizations and code can be found at https:// sites.google.com/view/dwsl/home. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stanford University. Correspondence to: Joey Hejna <jhejna@cs.stanford.edu>. |
| Pseudocode | Yes | Appendix B. DWSL Algorithm. Algorithm 1 Distance Weighted Supervised Learning. |
| Open Source Code | Yes | Visualizations and code can be found at https:// sites.google.com/view/dwsl/home. |
| Open Datasets | Yes | Gym robotics environments from Plappert et al. (2018), including Fetch and Hand... The Franka Kitchen dataset from Lynch et al. (2019)... We take the Square and Can datasets from Mandlekar et al. (2021)... |
| Dataset Splits | Yes | At test time, we sample goals from the end of a held-out set of validation trajectories... We use the same train/test split as done in Cui et al. (2022), and use the final states of demo trajectories as goal states. We use the same 90% random, 10% expert split from Ma et al. (2022). |
| Hardware Specification | No | The paper mentions training on 'GPUs' implicitly through discussions of image encoders and network architectures (e.g., 'We train encoders with gradients from the policy...'), but it does not specify any particular GPU models, CPU models, or other specific hardware components used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries used in the implementation. |
| Experiment Setup | Yes | We use the Adam optimizer for all experiments. For all methods, we relabel goals for each state by sampling uniformly from all future states in its trajectory. For baselines that use discount factors, we use γ = 0.98 for Gym Robotics environments, and γ = 0.99 for the remaining environments. We ran four seeds for all state experiments, and 3 seeds for all image experiments. For algorithm specific hyperparameters, we include them in Table 9. Max clip refers to the maximum exponentiated advantage weight we clip to. For other hyperparamters common to all methods, we include them in Table 10. |