reproducibilityindex.ai

Reconciling Rewards with Predictive State Representations

Authors: Andrea Baisero, Christopher Amato

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform empirical evaluations to conﬁrm the theory developed in this work, the issues with PSRs, and the validity of R-PSRs.
Researcher Affiliation	Academia	Andrea Baisero , Christopher Amato Northeastern University, Boston, Massachusetts, USA {baisero.a, c.amato}@northeastern.edu
Pseudocode	Yes	Algorithm 1 Depth-ﬁrst search of a maximal set of linearly independent intents I .
Open Source Code	Yes	Code available at https://github.com/abaisero/rl-rpsr.
Open Datasets	Yes	Our evaluation involves a total of 63 unique domains: 60 are taken from Cassandra s POMDP page [Cassandra, 1999], a repository of classic ﬁnite POMDPs from the literature; 2 are the well-known load/unload [Meuleau et al., 1999] and heaven/hell [Bonet, 1998]; and the last one is ﬂoat/reset [Littman and Sutton, 2002].
Dataset Splits	No	The paper describes running policies for 1000 episodes of 100 steps in simulation environments, but does not provide explicit train/validation/test dataset splits in the conventional sense.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	No	The paper states that policies were run for '1000 episodes of 100 steps' but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings typically found in experimental setups.