Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning

Authors: Hongjoon Ahn, Heewoong Choi, Jisu Han, Taesup Moon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show that the high-level policy learned using the OTA value function achieves strong performance on complex tasks from OGBench [32], a recently proposed offline GCRL benchmark, including maze navigation and visual robotic manipulation environments.
Researcher Affiliation Academia Hongjoon Ahn1 Heewoong Choi1 Jisu Han2 Taesup Moon1,2,3 1 Department of Electrical and Computer Engineering (ECE), Seoul National University 2 Interdisciplinary Program in Artificial Intelligence (IPAI), Seoul National University 3 ASRI / INMC, Seoul National University EMAIL
Pseudocode No The paper describes the methodology in narrative form, focusing on equations and explanations rather than structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/otav/ota-v
Open Datasets Yes We use OGBench [32], a recently proposed offline GCRL benchmark designed for realistic environments, long-horizon scenarios, and multi-goal evaluation.
Dataset Splits No The paper describes the evaluation protocol using 'five fixed task goals provided by OGBench' with '50 rollouts per goal' for measuring success rates, but it does not explicitly detail the training, validation, or test splits of the main offline dataset used for model training.
Hardware Specification Yes All experiments are conducted using NVIDIA RTX A5000 and A6000 GPUs.
Software Dependencies No The paper states that OTA was implemented 'on top of the official implementation of OGBench [32]' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We use goal-sampling distribution for value and policy learning, following OGBench. Data sampling scheme is based on HER [1], taking three different goal-sampling distributions... Task-specific hyperparameters are organized in Table 4, where hyperparameters are described in Equation (1) to Equation (4). Table 3 lists common hyperparameters for OTA including Learning rate, Optimizer, Minibatch size, Total gradient steps, MLP dimensions, Activation function, Target network smoothing coefficient, Discount factor γ, Image augmentation probability, Policy and Value sampling ratios. Table 4 provides task-specific hyperparameters like βh, βℓ, k, and n for different environments.