reproducibilityindex.ai

f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

Authors: Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that f-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and Fetch Reach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.
Researcher Affiliation	Collaboration	Siddhant Agarwal The University of Texas at Austin siddhant@cs.utexas.edu Ishan Durugkar Sony AI ishan.durugkar@sony.com Peter Stone The University of Texas at Austin Sony AI pstone@cs.utexas.edu Amy Zhang The University of Texas at Austin amy.zhang@austin.utexas.edu
Pseudocode	Yes	Algorithm 1 f-PG
Open Source Code	No	More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html. This website is a project overview page, and indicates code as "coming soon", not currently available.
Open Datasets	Yes	We use the Point Maze environments (Fu et al., 2020) which are a set of offline RL environments, and modify it to support our online algorithms.
Dataset Splits	No	The paper describes the environments and reports success rates, but does not provide specific details on how the data was split into training, validation, and testing sets.
Hardware Specification	No	The paper does not mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions software like Soft Q Learning and PPO implementations but does not specify version numbers for these or any other software dependencies.
Experiment Setup	No	The paper describes the general algorithm and environment characteristics (e.g., goal distribution standard deviation for Reacher, initial state sampling for Point Maze) but does not provide specific hyperparameters such as learning rate, batch size, or optimizer settings used for training the models in the experiments.