Bregman Gradient Policy Optimization

Authors: Feihu Huang, Shangqian Gao, Heng Huang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.
Researcher Affiliation Academia Feihu Huang , Shangqian Gao , Heng Huang Department of Electrical and Computer Engineering University of Pittsburgh Pittsburgh, PA 15261, USA huangfeihu2018@gmail.com, shg84@pitt.edu, heng.huang@pitt.edu
Pseudocode Yes Algorithm 1 BGPO Algorithm Algorithm 2 VR-BGPO Algorithm
Open Source Code Yes Our code is available at https: //github.com/gaosh/BGPO.
Open Datasets Yes To test the effectiveness of two different Bregman divergences, we evaluate them on three classic control environments from gym Brockman et al. (2016): Cart Pole-v1, Acrobat-v1, and Mountain Car Continuous-v0. ...To evaluate the performance of these algorithms, we test them on six gym (Brockman et al., 2016) environments with continuous control tasks...
Dataset Splits No The paper specifies environment names, batch sizes, and number of timesteps, but it does not explicitly provide percentages or sample counts for training, validation, and test dataset splits. It refers to "gym environments" which typically have predefined splits, but doesn't state them within the paper.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only discusses software environments and general experiment setups.
Software Dependencies No All methods include our method, are implemented with garage (garage contributors, 2019) and pytorch (Paszke et al., 2019). The paper mentions the software packages 'garage' and 'pytorch' but does not specify their version numbers or any other software dependencies with version information.
Experiment Setup Yes Table 2: Setups of environments and hyper-parameters for experiments in section 6.2 and section 6.3. Table 3: Setups of environments and hyper-parameters for experiments in section 6.4. These tables detail network sizes, number of timesteps, batch sizes, and specific hyperparameters like {b, m, c} values, λ, and learning rates for different environments.