reproducibilityindex.ai

Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

Authors: Daniel McDuff, Ashish Kapoor

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage. We conducted experiments to answer: (1) if we can build a deep predictive model that estimates a peripheral physiological response associated with SNS activity and (2) if using such predicted responses leads to sample efﬁciency in the RL framework.
Researcher Affiliation	Industry	Daniel Mc Duff and Ashish Kapoor Microsoft Research Redmond, WA {damcduff,akapoor}@microsoft.com
Pseudocode	No	The paper describes methods and architectures but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes collecting a dataset: 'To design a reward function based on the nervous system response of the driver in the simulated environment we collected a data set of physiological recordings and synchronized ﬁrst person video frames from the car.' It does not provide concrete access information (link, DOI, repository, formal citation for the dataset itself) for this collected data.
Dataset Splits	No	The paper states: 'In each case, the ﬁrst 75% of frames from the experimental recordings were taken as training examples and the latter 25% as testing examples.' It explicitly mentions training and testing splits but does not specify a separate validation split for the dataset used to train the CNN or for the RL experiments.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments or training the models.
Software Dependencies	No	The paper mentions software like 'Air Sim' and methods like 'DQN' and 'CNN' but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	The input frames were downsampled to 84 84 pixels and converted to grayscale format. They were normalized by subtracting the mean pixel value (calculated on the training set). A dense layer of 128 hidden units preceded the ﬁnal layer that had linear activation units and a mean square error (MSE) loss... a batch size of 128 examples was used. Max pooling was inserted between layers 2 and 3, layers 4 and 5, and layers 7 and 8. To overcome overﬁtting, a dropout layer (Srivastava et al., 2014) was added after layer 7 with rate d1 = 0.5. The loss during training of the reward model was the mean squared error. Each model was trained for 50 epochs.