Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning
Authors: Bastien Dubail, Stefan Stojanovic, Alexandre Proutiere
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical findings with experiments, and demonstrate that shifting the successor measure indeed leads to improved performance in goal-conditioned RL. |
| Researcher Affiliation | Academia | Bastien Dubail KTH, Stockholm, Sweden EMAIL Stojanovic KTH, Stockholm, Sweden EMAIL Proutiere KTH, Digital Futures, Stockholm, Sweden EMAIL |
| Pseudocode | No | The paper describes various algorithms and methods throughout its sections, particularly in the theoretical development, but it does not present any clearly labeled pseudocode blocks or algorithms in a structured, dedicated section. |
| Open Source Code | Yes | Code available at https://github.com/stesto KTH/shift-SM |
| Open Datasets | Yes | All three mazes are discretized versions of the Maze2D environments from (23). [23] Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. ar Xiv preprint ar Xiv:2004.07219, 2020. |
| Dataset Splits | No | The paper discusses collecting data through "dataset of transitions (s,a,s ) collected offline" and refers to "number of trajectories" and "random goals". It describes the data generation process and how many trajectories or samples are used, but it does not specify explicit training, validation, and test splits for a static dataset. |
| Hardware Specification | No | All experiments were run on a single CPU and are reproducible within a day. As mentioned in the main text, all code is available at https://github.com/stesto KTH/shift-SM. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with their version numbers within the text. It implies the use of standard reinforcement learning frameworks through its methodology but does not provide details like "Python 3.x, PyTorch 1.x, CUDA 11.x". |
| Experiment Setup | Yes | We perform experiments in the Medium Point Maze environment with 104 discrete states and 4 actions (see Figure 4 (a)). ... As shown in Figure 4 (e-f), larger shifts degrade performance when successor measures are learned via TD. This aligns with the intuition that estimating long-horizon dynamics is harder and introduces more error, particularly in low-data regimes. Finally, we assess how data efficiency depends on the shift parameter by fixing the rank to r = 40 and varying the number of samples in Figure 4 (g-h). We find that a moderate shift (k = 3) consistently yields the best performance, suggesting a trade-off: while shifting improves expressivity, its estimation must remain tractable. |