Representation Discovery for MDPs Using Bisimulation Metrics

Authors: Sherry Ruan, Gheorghe Comanici, Prakash Panangaden, Doina Precup

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the improvement by computing bisimulation metrics over a series of MDPs (the well-known Puddle World problem)... The bottom panel of Figure 1 shows the runtime of computing bisimulation metrics over partitions for up to 14 iterations of Algorithm 1. These empirical results illustrate that the computational complexity of computing bisimulation-based representations and corresponding metrics is mostly dependent on the intrinsic complexity of the reward function and transition models. To assess the importance of the asynchronous partition and metric update, we fixed the size of the Puddle World and compared the value function approximation error as a function of the size of intermediate representations. ...As can be seen in Figure 2, the asynchronous algorithm reaches representations of higher quality at much earlier stages of the iterative framework. This empirical result illustrates the use of heuristics can substantially speed up the computation.
Researcher Affiliation Academia Sherry Shanshan Ruan, Gheorghe Comanici, Prakash Panangaden, and Doina Precup School of Computer Science Mc Gill University, Montreal, QC, Canada {sherry, gcoman, prakash, dprecup}@cs.mcgill.ca
Pseudocode Yes Algorithm 1 Partition declustering
Open Source Code No The paper does not provide any specific links or explicit statements about the release of source code for the described methodology.
Open Datasets No The paper mentions 'the well-known Puddle World problem' but does not provide concrete access information (specific link, DOI, repository name, or a formal citation with authors/year) for the dataset used in their experiments.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup No The paper does not contain specific experimental setup details such as concrete hyperparameter values, detailed training configurations, or system-level settings in the main text.