Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Authors: Hongjoon Ahn, Heewoong Choi, Jisu Han, Taesup Moon
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally show that the high-level policy learned using the OTA value function achieves strong performance on complex tasks from OGBench [32], a recently proposed offline GCRL benchmark, including maze navigation and visual robotic manipulation environments. |
| Researcher Affiliation | Academia | Hongjoon Ahn1 Heewoong Choi1 Jisu Han2 Taesup Moon1,2,3 1 Department of Electrical and Computer Engineering (ECE), Seoul National University 2 Interdisciplinary Program in Artificial Intelligence (IPAI), Seoul National University 3 ASRI / INMC, Seoul National University EMAIL |
| Pseudocode | No | The paper describes the methodology in narrative form, focusing on equations and explanations rather than structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/otav/ota-v |
| Open Datasets | Yes | We use OGBench [32], a recently proposed offline GCRL benchmark designed for realistic environments, long-horizon scenarios, and multi-goal evaluation. |
| Dataset Splits | No | The paper describes the evaluation protocol using 'five fixed task goals provided by OGBench' with '50 rollouts per goal' for measuring success rates, but it does not explicitly detail the training, validation, or test splits of the main offline dataset used for model training. |
| Hardware Specification | Yes | All experiments are conducted using NVIDIA RTX A5000 and A6000 GPUs. |
| Software Dependencies | No | The paper states that OTA was implemented 'on top of the official implementation of OGBench [32]' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We use goal-sampling distribution for value and policy learning, following OGBench. Data sampling scheme is based on HER [1], taking three different goal-sampling distributions... Task-specific hyperparameters are organized in Table 4, where hyperparameters are described in Equation (1) to Equation (4). Table 3 lists common hyperparameters for OTA including Learning rate, Optimizer, Minibatch size, Total gradient steps, MLP dimensions, Activation function, Target network smoothing coefficient, Discount factor γ, Image augmentation probability, Policy and Value sampling ratios. Table 4 provides task-specific hyperparameters like βh, βℓ, k, and n for different environments. |