Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Physics-informed Value Learner for Offline Goal-Conditioned Reinforcement Learning
Authors: Vittorio Giammarino, Ruiqi Ni, Ahmed Qureshi
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we analyze the effects of the Eikonal regularizer in (9) on the GCVF estimation problem. Specifically, we will perform an ablation over different designs of speed profiles S(s), compare the Eikonal regularizer with an HJB regularizer, and analyze value functions learned with and without our Eikonal term. Then, we compare the performance obtained by our Eikonal-regularized algorithm, Eik-HIQL, against the SOTA algorithms for Offline GCRL. The experiments in this section are conducted on the environments in Fig. 2. A table summarizing the most relevant hyperparameter values is provided in Appendix D. Table 1: Summary of the speed profiles ablation. All agents are trained for 100,000 training steps using 10 seeds. We report the mean and standard deviation across seeds for the best evaluation achieved during training. For each seed, evaluations are conducted over 5 different random goals, as designed in Park et al. [11], with the learned policy tested for 50 episodes per goal. Results within 95% of the best value are written in bold. Table 2: Complete comparison between Eik-Hi QRL and the Offline GCRL baselines. Agents are trained for 100,000 steps on pointmaze tasks and 1 million steps on the remaining tasks, each using 10 seeds. The evaluation follows the methodology described in Table 1. We report the mean and standard deviation across seeds for the best evaluation achieved during training. Results within 95% of the best value are written in bold, and rows are highlighted when the Eikonal regularizer improves performance by 100% or more compared to the non-regularized HIQL performance. |
| Researcher Affiliation | Academia | Vittorio Giammarino, Ruiqi Ni and Ahmed H. Qureshi Department of Computer Science Purdue University EMAIL |
| Pseudocode | Yes | The full pseudocode for Eik-HIQL as well as a JAX [49] implementation showing how to compute the gradient s VθV in (9) are provided in Appendix D, respectively Algorithm 1 and Algorithm 2. |
| Open Source Code | Yes | Code is available at link1. 1https://github.com/Vittorio Giammarino/Eik-HIQL |
| Open Datasets | Yes | Our evaluation, conducted on the challenging OGbench benchmark [11], compares Eik-HIQL against Quasimetric RL (QRL) [12], Contrastive RL (CRL) [13], and the standard HIQL baseline. [11] Seohong Park, Kevin Frans, Benjamin Eysenbach, and Sergey Levine. Ogbench: Benchmarking offline goal-conditioned rl. ar Xiv preprint ar Xiv:2410.20092, 2024. |
| Dataset Splits | No | In the offline setting, the learning agent must optimize J(π) using only a static, offline dataset D, which comprises trajectories of the form τ = (s0, a0, s1, s2, . . . , s T ). For each seed, evaluations are conducted over 5 different random goals, as designed in Park et al. [11], with the learned policy tested for 50 episodes per goal. The paper describes the evaluation methodology and the use of a static offline dataset D, but does not provide specific training/validation/test splits of the dataset itself. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA RTX 3090 GPU (24 GB VRAM), using a local server equipped with a 12th Gen Intel i7-12700F CPU, 32 GB RAM. No cloud services or compute clusters were used. |
| Software Dependencies | No | The full pseudocode for Eik-HIQL as well as a JAX [49] implementation showing how to compute the gradient s VθV in (9) are provided in Appendix D, respectively Algorithm 1 and Algorithm 2. Finally, Table 3 reports the hyperparameter values most commonly used in our experiments. For more implementation details, refer to our Git Hub repository. Table 3 also lists 'Optimizer Adam'. While JAX and Adam are mentioned, specific version numbers for any software dependencies are not provided. |
| Experiment Setup | Yes | Table 3: Hyperparameter values for Eik-HIQL. Hyperparameter Name Value Decay rate (λ) 1.0 Minimum speed (Smin) 0.1 Discount factor (γ) 0.99 Batch size (B) 1024 Optimizer Adam Learning rates αV , αhi, αlo 3 10 4 Target update rate (τ) 0.005 Expectile factor (ι) 0.7 Inverse temperature parameter (β) 3.0 |