Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods
Authors: Chris Nota, Philip Thomas, Bruno C. Da Silva
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further illustrate the variance reduction properties of posterior value functions on a tabular gridworld domain with partial observability. We compared agents using learned estimates of the posterior, prior, and observation value functions as baselines for the policy gradient theorem. |
| Researcher Affiliation | Academia | 1College of Information and Computer Science, University of Massachusetts, Amherst, MA. |
| Pseudocode | No | The paper describes methods using mathematical equations and prose but does not include structured pseudocode or algorithm blocks with explicit labels like 'Algorithm'. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper uses a custom-designed 'tabular gridworld domain' shown in Figure 5, but it does not provide concrete access information (link, DOI, repository, or formal citation) for this dataset to be publicly available. |
| Dataset Splits | No | The paper describes an experimental setup within a gridworld environment and mentions training policies, but it does not provide specific train/validation/test dataset splits (percentages, counts, or predefined citations) as it appears to be a simulation environment rather than a static dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions 'standard REINFORCE with baselines algorithms' but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | No | The paper describes the gridworld environment setup and general training approach (REINFORCE with baselines) and how results were averaged, but it defers 'full experimental details' to supplemental material and does not provide specific hyperparameters or system-level training settings in the main text. |