Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Score Models for Offline Goal-Conditioned Reinforcement Learning
Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments on the fully offline GCRL benchmark composed of robot manipulation and locomotion tasks, including high-dimensional observations, show that SMORe can outperform state-of-the-art baselines by a significant margin. Our experiments study the effectiveness of proposed GCRL algorithm SMORe on a set of simulated benchmarks against other GCRL methods that employ behavior cloning, RL with sparse reward, and contrastive learning. |
| Researcher Affiliation | Collaboration | Harshit Sikchi θ , Rohan Chitnis: ϕ, Ahmed Touati: ϕ, Alborz Geramifardϕ, Amy Zhangθ,ϕ, Scott Niekumψ θ University of Texas at Austin, ϕ Meta AI, ψ UMass Amherst |
| Pseudocode | Yes | Algorithm 1: SMORe |
| Open Source Code | Yes | Project page (Code and Videos): hari-sikchi.github.io/smore/ |
| Open Datasets | Yes | For locomotion tasks, we generate our dataset using the D4RL benchmark (Fu et al., 2020), combining a random or medium dataset with 30 episodes of expert data. The datasets used from D4RL are licensed under Apache 2.0. |
| Dataset Splits | No | The paper mentions training iterations and hyperparameter searches but does not explicitly provide training/validation/test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used for the experiments. |
| Experiment Setup | Yes | Table 4: Hyperparameters for SMORe. Policy learning rate 3e-4, Value learning rate 3e-4, MLP layers (256,256), Batch Size 512, Mixture ratio β 0.5, Policy temperature (α) 3.0. Table 5: Hyperparameters for image-observation GCRL from Zheng et al. (2023). batch size 2048, number of training epochs 300, learning rate 3e-4. |