reproducibilityindex.ai

Distance Weighted Supervised Learning for Offline Interaction Data

Authors: Joey Hejna, Jensen Gao, Dorsa Sadigh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Across all datasets we test, DWSL empirically maintains behavior cloning as a lower bound while still exhibiting policy improvement. In high-dimensional image domains, DWSL surpasses the performance of both prior goal-conditioned IL and RL algorithms. Visualizations and code can be found at https:// sites.google.com/view/dwsl/home.
Researcher Affiliation	Academia	1Department of Computer Science, Stanford University. Correspondence to: Joey Hejna <jhejna@cs.stanford.edu>.
Pseudocode	Yes	Appendix B. DWSL Algorithm. Algorithm 1 Distance Weighted Supervised Learning.
Open Source Code	Yes	Visualizations and code can be found at https:// sites.google.com/view/dwsl/home.
Open Datasets	Yes	Gym robotics environments from Plappert et al. (2018), including Fetch and Hand... The Franka Kitchen dataset from Lynch et al. (2019)... We take the Square and Can datasets from Mandlekar et al. (2021)...
Dataset Splits	Yes	At test time, we sample goals from the end of a held-out set of validation trajectories... We use the same train/test split as done in Cui et al. (2022), and use the final states of demo trajectories as goal states. We use the same 90% random, 10% expert split from Ma et al. (2022).
Hardware Specification	No	The paper mentions training on 'GPUs' implicitly through discussions of image encoders and network architectures (e.g., 'We train encoders with gradients from the policy...'), but it does not specify any particular GPU models, CPU models, or other specific hardware components used for the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or other libraries used in the implementation.
Experiment Setup	Yes	We use the Adam optimizer for all experiments. For all methods, we relabel goals for each state by sampling uniformly from all future states in its trajectory. For baselines that use discount factors, we use γ = 0.98 for Gym Robotics environments, and γ = 0.99 for the remaining environments. We ran four seeds for all state experiments, and 3 seeds for all image experiments. For algorithm specific hyperparameters, we include them in Table 9. Max clip refers to the maximum exponentiated advantage weight we clip to. For other hyperparamters common to all methods, we include them in Table 10.