An Alternative Softmax Operator for Reinforcement Learning
Authors: Kavosh Asadi, Michael L. Littman
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a variant of SARSA algorithm that, by utilizing the new operator, computes a Boltzmann policy with a state-dependent temperature parameter. We show that the algorithm is convergent and that it performs favorably in practice. and We present three additional experiments. |
| Researcher Affiliation | Academia | 1Brown University, USA. |
| Pseudocode | Yes | Algorithm 1 SARSA with Boltzmann softmax policy |
| Open Source Code | No | No statement providing concrete access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | We used the lunar lander domain, from Open AI Gym (Brockman et al., 2016) as our benchmark. and We evaluated SARSA on the multi-passenger taxi domain introduced by Dearden et al. (1998). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits. It mentions using environments like Lunar Lander and Multi-passenger Taxi, but no explicit data partitioning information is given. |
| Hardware Specification | No | No specific hardware details (such as GPU or CPU models, memory specifications, or cloud instance types) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | We used Keras (Chollet, 2015) and Theano (Team et al., 2016) to implement the neural network architecture. (The years refer to publication dates, not software versions used for the experiment). |
| Experiment Setup | Yes | We used the Adam algorithm (Kingma & Ba, 2014) with α = 0.005 and the other parameters as suggested by the paper. A batch episode size of 10 was used, as we had stability issues with smaller episode batch sizes. |