Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Authors: Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our approach on 7 multi-task benchmarks with over 280 unique tasks, spanning high degree-of-freedom humanoid control and discrete vision-based RL. We find that, despite its simplicity, the proposed approach leads to state-of-the-art single and multi-task performance, as well as sample-efficient transfer to new tasks.
Researcher Affiliation	Academia	Michal Nauman1,2 Marek Cygan2,3 Carmelo Sferrazza1 Aviral Kumar4 Pieter Abbeel1,5 Pieter Abbeel holds concurrent appointments as a Professor at UC Berkeley and as an Amazon Scholar. This paper describes work performed at UC Berkeley and is not associated with Amazon. Marek Cygan was partially supported by National Science Centre, Poland, under the grant 2024/54/E/ST6/00388. We also gratefully acknowledge the Polish high-performance computing infrastructure, PLGrid (HPC Center: ACK Cyfronet AGH), for providing computational resources and support under grant no. PLG/2024/017817.
Pseudocode	No	The paper describes methods and architectures but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	We open-source our code under the following link: https://github.com/naumix/Bigger Regularized Categorical.
Open Datasets	Yes	Benchmarks. We consider a wide range of tasks, with a total of 283 diverse, complex control problems spanning five domains: Deep Mind Control (DMC) [101], Meta World (MW) [121], Humanoid Bench (HB) [93], Atari [10], and Shadow Hand (SH) [49].
Dataset Splits	Yes	We list all the task sets considered in Appendix E. Transfer learning. In our transfer experiments, we evaluate three adaptation protocols inspired by previous work. ... The multi-task model is not trained on the transfer tasks, mimicking the train-test split used in supervised learning [14]. We report results for transfer experiments in Figures 2, 10 and 11. We list the tasks used in multi-task and transfer learning in Appendix E.
Hardware Specification	Yes	Hardware Information & Reproducibility All experiments were conducted on an NVIDIA A100 and H100 GPUs with 80GB of RAM and 16 CPU cores of AMD EPYC 7742 processor.
Software Dependencies	No	We would like to thank the Python [109], Num Py [42], Matplotlib [50], Sci Py [110] and JAX [16] communities for developing tools that supported this work. The paper lists software tools used but does not specify version numbers for these components, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	We discuss our experimental setting in Section 4 and Appendix D. We detail hyperparameters in Appendix F. F Hyperparameters We detail the hyperparameters used in our experiments in Tables 3 and 4 below. As discussed in Section 4, we use a single hyperparameter configuration across all tested tasks, showcasing robustness of our approach.