Factored Policy Gradients: Leveraging Structure for Efficient Learning in MOMDPs
Authors: Thomas Spooner, Nelson Vadori, Sumitra Ganesh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The final contribution is to illustrate the effectiveness of our approach over traditional estimators on two high-dimensional benchmark domains. (From Introduction, Contributions section) and Section 5 Numerical Experiments with figures showing performance (e.g., Figure 4, Figure 5). |
| Researcher Affiliation | Industry | Thomas Spooner J. P. Morgan AI Research thomas.spooner@jpmorgan.com, Nelson Vadori J. P. Morgan AI Research nelson.vadori@jpmorgan.com, Sumitra Ganesh J. P. Morgan AI Research sumitra.ganesh@jpmorgan.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In particular, we consider variants of the (3 × 3) grid network benchmark environment as originally proposed by Vinitsky et al. [51] that is provided by the outstanding Flow framework [56, 21]. |
| Dataset Splits | No | The paper does not specify explicit training, validation, and test splits (e.g., percentages or sample counts) for the datasets or environments used. It mentions using '10 random seeds' for averaging results but not data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud computing instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the 'Flow framework' and algorithms like 'PPO' and 'GAE', but it does not provide specific version numbers for these or any other software dependencies needed for reproduction. |
| Experiment Setup | Yes | In our experiments, the centroids were initialised with a uniform distribution, c ∼ U (−5, 5) and were held fixed between episodes. The policy was defined as an isotropic Gaussian with fixed covariance, diag(1), and initial location vector µ := 0. The parameter vector, µ, was updated at each time step, and the hyperparameters are provided in the appendix. |