Reconciling Rewards with Predictive State Representations
Authors: Andrea Baisero, Christopher Amato
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform empirical evaluations to confirm the theory developed in this work, the issues with PSRs, and the validity of R-PSRs. |
| Researcher Affiliation | Academia | Andrea Baisero , Christopher Amato Northeastern University, Boston, Massachusetts, USA {baisero.a, c.amato}@northeastern.edu |
| Pseudocode | Yes | Algorithm 1 Depth-first search of a maximal set of linearly independent intents I . |
| Open Source Code | Yes | Code available at https://github.com/abaisero/rl-rpsr. |
| Open Datasets | Yes | Our evaluation involves a total of 63 unique domains: 60 are taken from Cassandra s POMDP page [Cassandra, 1999], a repository of classic finite POMDPs from the literature; 2 are the well-known load/unload [Meuleau et al., 1999] and heaven/hell [Bonet, 1998]; and the last one is float/reset [Littman and Sutton, 2002]. |
| Dataset Splits | No | The paper describes running policies for 1000 episodes of 100 steps in simulation environments, but does not provide explicit train/validation/test dataset splits in the conventional sense. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper states that policies were run for '1000 episodes of 100 steps' but does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings typically found in experimental setups. |