Fear and Hope Emerge from Anticipation in Model-Based Reinforcement Learning
Authors: Thomas Moerland, Joost Broekens, Catholijn Jonker
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Test results in three known RL domains illustrate emotion dynamics, dependencies on policy and environmental stochasticity, and plausibility in individual Pacman game settings. |
| Researcher Affiliation | Academia | Thomas Moerland, Joost Broekens, and Catholijn Jonker Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands {T.M.Moerland,D.J.Broekens,C.M.Jonker}@tudelft.nl |
| Pseudocode | Yes | Algorithm 1 Model-based reinforcement learning with emotion simulation. |
| Open Source Code | No | The paper does not provide any statements or links indicating that open-source code for the methodology is available. |
| Open Datasets | Yes | We test the emotion models in three scenario s: the Taxi domain (4.1) for hope, joy and distress, the Cliff Walking scenario (4.2) for fear, and finally Pacman (4.3) for plausibility of signals in a more complex and partially observable task. The Taxi domain (figure 1, introduced in [Dietterich, 1998]). In the Cliff Walking scenario (figure 3, adopted from p.149 of [Sutton and Barto, 1998])... Pacman (figure 5, based on [Sequeira et al., 2014]). |
| Dataset Splits | No | The paper describes the experimental scenarios and training process (e.g., 'Pacman interact with the environment for 100000 iterations'), but it does not specify explicit training/validation/test dataset splits, which are more common in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | Results for UCT(N=300, dmax=7). The =0.05 agent has less exploration, which makes it more hopefull about reaching the target. Results for UCT(N = 18,dmax = 4) runs on a converged model. We modify equation 4 from a strict maximization to an -greedy policy with =0.10. Pacman interact with the environment for 100000 iterations ( linear decreasing from 1 to 0.05 in the first 30000 iterations). |