reproducibilityindex.ai

Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods

Authors: Chris Nota, Philip Thomas, Bruno C. Da Silva

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further illustrate the variance reduction properties of posterior value functions on a tabular gridworld domain with partial observability. We compared agents using learned estimates of the posterior, prior, and observation value functions as baselines for the policy gradient theorem.
Researcher Affiliation	Academia	1College of Information and Computer Science, University of Massachusetts, Amherst, MA.
Pseudocode	No	The paper describes methods using mathematical equations and prose but does not include structured pseudocode or algorithm blocks with explicit labels like 'Algorithm'.
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets	No	The paper uses a custom-designed 'tabular gridworld domain' shown in Figure 5, but it does not provide concrete access information (link, DOI, repository, or formal citation) for this dataset to be publicly available.
Dataset Splits	No	The paper describes an experimental setup within a gridworld environment and mentions training policies, but it does not provide specific train/validation/test dataset splits (percentages, counts, or predefined citations) as it appears to be a simulation environment rather than a static dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions 'standard REINFORCE with baselines algorithms' but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	No	The paper describes the gridworld environment setup and general training approach (REINFORCE with baselines) and how results were averaged, but it defers 'full experimental details' to supplemental material and does not provide specific hyperparameters or system-level training settings in the main text.