Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Authors: Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM. |
| Researcher Affiliation | Academia | 1Northwestern University 2Georgia Tech |
| Pseudocode | Yes | Algorithm 1 Model-Based Reparameterization Policy Gradient |
| Open Source Code | Yes | Our code is available at https://github.com/agentification/RP_PGM. |
| Open Datasets | No | The paper refers to "Mu Jo Co [57] tasks" as the environment for experiments, but does not provide concrete access information (link, DOI, formal citation with authors/year, or specific dataset name) for a publicly available dataset used for training. |
| Dataset Splits | No | The paper does not specify exact percentages, sample counts, or refer to predefined splits for training, validation, or test datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using "Py Torch [44]" but does not provide a specific version number for this or any other software dependency. |
| Experiment Setup | No | The paper describes network architectures and algorithm choices, but does not provide specific hyperparameters like learning rate, batch size, or number of epochs for the experimental setup. |