Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

Authors: Daniel McDuff, Ashish Kapoor

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage. We conducted experiments to answer: (1) if we can build a deep predictive model that estimates a peripheral physiological response associated with SNS activity and (2) if using such predicted responses leads to sample efficiency in the RL framework.
Researcher Affiliation Industry Daniel Mc Duff and Ashish Kapoor Microsoft Research Redmond, WA {damcduff,akapoor}@microsoft.com
Pseudocode No The paper describes methods and architectures but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes collecting a dataset: 'To design a reward function based on the nervous system response of the driver in the simulated environment we collected a data set of physiological recordings and synchronized first person video frames from the car.' It does not provide concrete access information (link, DOI, repository, formal citation for the dataset itself) for this collected data.
Dataset Splits No The paper states: 'In each case, the first 75% of frames from the experimental recordings were taken as training examples and the latter 25% as testing examples.' It explicitly mentions training and testing splits but does not specify a separate validation split for the dataset used to train the CNN or for the RL experiments.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments or training the models.
Software Dependencies No The paper mentions software like 'Air Sim' and methods like 'DQN' and 'CNN' but does not list specific software dependencies with version numbers.
Experiment Setup Yes The input frames were downsampled to 84 84 pixels and converted to grayscale format. They were normalized by subtracting the mean pixel value (calculated on the training set). A dense layer of 128 hidden units preceded the final layer that had linear activation units and a mean square error (MSE) loss... a batch size of 128 examples was used. Max pooling was inserted between layers 2 and 3, layers 4 and 5, and layers 7 and 8. To overcome overfitting, a dropout layer (Srivastava et al., 2014) was added after layer 7 with rate d1 = 0.5. The loss during training of the reward model was the mean squared error. Each model was trained for 50 epochs.