Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Resmax: An Alternative Soft-Greedy Operator for Reinforcement Learning

Authors: Erfan Miahi, Revan MacQueen, Alex Ayoub, Abbas Masoumzadeh, Martha White

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate that resmax is comparable to or outperforms ε-greedy and softmax across a variety of environments in tabular and deep RL.
Researcher Affiliation	Academia	Erfan Miahi EMAIL Department of Computing Science University of Alberta Revan Mac Queen EMAIL Department of Computing Science University of Alberta/Amii Alex Ayoub EMAIL Department of Computing Science University of Alberta Abbas Masoumzadeh EMAIL University of Alberta Martha White EMAIL Department of Computing Science University of Alberta
Pseudocode	No	The paper describes steps in regular paragraph text and mathematical equations, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets	Yes	Consider, for instance, the River Swim environment (Strehl & Littman, 2008) ... We conduct experiments in the deep RL setting (1890 experiments) across both easy and hard exploration Atari 2600 environments (Bellemare et al., 2013).
Dataset Splits	No	The paper does not provide specific training, testing, or validation dataset splits. Experiments are conducted in reinforcement learning environments (River Swim, Atari), which typically do not involve pre-defined dataset splits in the supervised learning sense, and no details on how episodes or environments were partitioned for evaluation are given.
Hardware Specification	Yes	The possible options were 2.1Ghz Intel CPUs with model numbers E5-2683 V4 Broadwell, E7-4809 V4 Broadwell, or Platinum 8160F Skylake, as well 2.4Ghz Intel Platinum 8260 Cascade Lake. For GPU experiments, we used V100 Volta GPU.
Software Dependencies	No	For implementing the neural networks we used the PyTorch framework.
Experiment Setup	Yes	In the deep RL setting, we use the DQN algorithm. We chose a fixed set of parameters that work well across all three benchmark environments. These parameters are presented in Appendix Table 1. We swept over three different step sizes across all our experiments: 0.0005, 0.0001, 0.00005. ... Table 1: The fixed parameters used to run DQN experiments. Parameter Name Fixed Value Optimizer Adam β1 0.9 β2 0.999 ϵ for Adam 10 8 Batch size 64 Buffer size 100, 000 Number of training steps per iteration 1 Target network update frequency 1, 000 Number of steps before learning starts 50, 000 γ 0.99 ... For implementing the neural networks we used the PyTorch framework. We used a convolutional neural network as the function approximation, with three convolutional layers that are followed by two fully connected layers. Re LU is used as the activation function for these networks. Convolutional layers have 32, 64, and 64 filters; a kernel size of 8, 4, and 3, and a stride of 4, 2, and 1, respectively. The first fully connected includes 512 neurons, and the second one outputs the action values. We use uniform Xavier initialization to initialize the weights of the network.