f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences

Authors: Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We find that f-PG has better performance compared to standard policy gradient methods on a challenging gridworld as well as the Point Maze and Fetch Reach environments. More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html.
Researcher Affiliation Collaboration Siddhant Agarwal The University of Texas at Austin siddhant@cs.utexas.edu Ishan Durugkar Sony AI ishan.durugkar@sony.com Peter Stone The University of Texas at Austin Sony AI pstone@cs.utexas.edu Amy Zhang The University of Texas at Austin amy.zhang@austin.utexas.edu
Pseudocode Yes Algorithm 1 f-PG
Open Source Code No More information on our website https://agarwalsiddhant10.github.io/projects/fpg.html. This website is a project overview page, and indicates code as "coming soon", not currently available.
Open Datasets Yes We use the Point Maze environments (Fu et al., 2020) which are a set of offline RL environments, and modify it to support our online algorithms.
Dataset Splits No The paper describes the environments and reports success rates, but does not provide specific details on how the data was split into training, validation, and testing sets.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions software like Soft Q Learning and PPO implementations but does not specify version numbers for these or any other software dependencies.
Experiment Setup No The paper describes the general algorithm and environment characteristics (e.g., goal distribution standard deviation for Reacher, initial state sampling for Point Maze) but does not provide specific hyperparameters such as learning rate, batch size, or optimizer settings used for training the models in the experiments.