Score Models for Offline Goal-Conditioned Reinforcement Learning
Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on the fully offline GCRL benchmark composed of robot manipulation and locomotion tasks, including high-dimensional observations, show that SMORe can outperform state-of-the-art baselines by a significant margin. Our experiments study the effectiveness of proposed GCRL algorithm SMORe on a set of simulated benchmarks against other GCRL methods that employ behavior cloning, RL with sparse reward, and contrastive learning. |
| Researcher Affiliation | Collaboration | Harshit Sikchi θ , Rohan Chitnis: ϕ, Ahmed Touati: ϕ, Alborz Geramifardϕ, Amy Zhangθ,ϕ, Scott Niekumψ θ University of Texas at Austin, ϕ Meta AI, ψ UMass Amherst |
| Pseudocode | Yes | Algorithm 1: SMORe |
| Open Source Code | Yes | Project page (Code and Videos): hari-sikchi.github.io/smore/ |
| Open Datasets | Yes | For locomotion tasks, we generate our dataset using the D4RL benchmark (Fu et al., 2020), combining a random or medium dataset with 30 episodes of expert data. The datasets used from D4RL are licensed under Apache 2.0. |
| Dataset Splits | No | The paper mentions training iterations and hyperparameter searches but does not explicitly provide training/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used for the experiments. |
| Experiment Setup | Yes | Table 4: Hyperparameters for SMORe. Policy learning rate 3e-4, Value learning rate 3e-4, MLP layers (256,256), Batch Size 512, Mixture ratio β 0.5, Policy temperature (α) 3.0. Table 5: Hyperparameters for image-observation GCRL from Zheng et al. (2023). batch size 2048, number of training epochs 300, learning rate 3e-4. |