Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Forecasting in Offline Reinforcement Learning for Non-stationary Environments

Authors: Suzan Ece Ada, Georg Martius, Emre Ugur, Erhan Oztop

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines.
Researcher Affiliation Academia 1Bogazici University, Türkiye 2University of Tübingen, Germany 3Ozyegin University, Türkiye 4Osaka University, Japan
Pseudocode Yes Algorithm 1 Candidate Selection
Open Source Code No We plan to provide open access to code in the future.
Open Datasets Yes We evaluate FORL across navigation and manipulation tasks in D4RL [15] and OGBench [21] offline RL environments, each augmented with five real-world non-stationarity domains sourced from [22].
Dataset Splits Yes Training (Offline Stationary MDP) We begin with an episodic, stationary Markov Decision Process (MDP) Mtrain = (S, A, T , R, 0), where the initial state distribution 0 is a uniform distribution over the state space S. We only have access to an offline RL dataset D = {(sk t )} with k transitions collected from this MDP. Crucially, our FORL diffusion model and a diffusion policy [14] are trained offline using this dataset, such as the standard D4RL benchmark [15], without making any assumptions about how the environment might become non-stationary at test time.
Hardware Specification No The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).
Software Dependencies No The paper does not explicitly provide specific version numbers for key software components such as Python, PyTorch, or CUDA used for their implementation. It only references a library in a citation ([22] Gluon TS) but not its own development stack.
Experiment Setup Yes We use the noise prediction model [11] with the reverse diffusion chain s(n 1) t formulated as (n)(1 (n)) (s(n) t , (t,w), n) + p1 (n) where N(0, I) for n = N, . . . , 1, and = 0 for n = 1 [11]. ... and the weighting factors (n) = N )+(βmax βmin) 2n 1 2N2 ) where βmax = 10 and βmin = 0.1 are parameters introduced for empirical reasons [19]. ... Results average 5 seeds, unless noted.