Reconciling Rewards with Predictive State Representations

Authors: Andrea Baisero, Christopher Amato

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform empirical evaluations to confirm the theory developed in this work, the issues with PSRs, and the validity of R-PSRs.
Researcher Affiliation Academia Andrea Baisero , Christopher Amato Northeastern University, Boston, Massachusetts, USA {baisero.a, c.amato}@northeastern.edu
Pseudocode Yes Algorithm 1 Depth-first search of a maximal set of linearly independent intents I .
Open Source Code Yes Code available at https://github.com/abaisero/rl-rpsr.
Open Datasets Yes Our evaluation involves a total of 63 unique domains: 60 are taken from Cassandra s POMDP page [Cassandra, 1999], a repository of classic finite POMDPs from the literature; 2 are the well-known load/unload [Meuleau et al., 1999] and heaven/hell [Bonet, 1998]; and the last one is float/reset [Littman and Sutton, 2002].
Dataset Splits No The paper describes running policies for 1000 episodes of 100 steps in simulation environments, but does not provide explicit train/validation/test dataset splits in the conventional sense.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper states that policies were run for '1000 episodes of 100 steps' but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings typically found in experimental setups.