reproducibilityindex.ai

An Alternative Softmax Operator for Reinforcement Learning

Authors: Kavosh Asadi, Michael L. Littman

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice. and We present three additional experiments.
Researcher Affiliation	Academia	1Brown University, USA.
Pseudocode	Yes	Algorithm 1 SARSA with Boltzmann softmax policy
Open Source Code	No	No statement providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets	Yes	We used the lunar lander domain, from Open AI Gym (Brockman et al., 2016) as our benchmark. and We evaluated SARSA on the multi-passenger taxi domain introduced by Dearden et al. (1998).
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits. It mentions using environments like Lunar Lander and Multi-passenger Taxi, but no explicit data partitioning information is given.
Hardware Specification	No	No specific hardware details (such as GPU or CPU models, memory specifications, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies	No	We used Keras (Chollet, 2015) and Theano (Team et al., 2016) to implement the neural network architecture. (The years refer to publication dates, not software versions used for the experiment).
Experiment Setup	Yes	We used the Adam algorithm (Kingma & Ba, 2014) with α = 0.005 and the other parameters as suggested by the paper. A batch episode size of 10 was used, as we had stability issues with smaller episode batch sizes.