reproducibilityindex.ai

Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs

Authors: Thomas Spooner, Nelson Vadori, Sumitra Ganesh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The ﬁnal contribution is to illustrate the effectiveness of our approach over traditional estimators on two high-dimensional benchmark domains. (From Introduction, Contributions section) and Section 5 Numerical Experiments with figures showing performance (e.g., Figure 4, Figure 5).
Researcher Affiliation	Industry	Thomas Spooner J. P. Morgan AI Research thomas.spooner@jpmorgan.com, Nelson Vadori J. P. Morgan AI Research nelson.vadori@jpmorgan.com, Sumitra Ganesh J. P. Morgan AI Research sumitra.ganesh@jpmorgan.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	In particular, we consider variants of the (3 × 3) grid network benchmark environment as originally proposed by Vinitsky et al. [51] that is provided by the outstanding Flow framework [56, 21].
Dataset Splits	No	The paper does not specify explicit training, validation, and test splits (e.g., percentages or sample counts) for the datasets or environments used. It mentions using '10 random seeds' for averaging results but not data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud computing instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software like the 'Flow framework' and algorithms like 'PPO' and 'GAE', but it does not provide specific version numbers for these or any other software dependencies needed for reproduction.
Experiment Setup	Yes	In our experiments, the centroids were initialised with a uniform distribution, c ∼ U (−5, 5) and were held ﬁxed between episodes. The policy was deﬁned as an isotropic Gaussian with ﬁxed covariance, diag(1), and initial location vector µ := 0. The parameter vector, µ, was updated at each time step, and the hyperparameters are provided in the appendix.