reproducibilityindex.ai

Bregman Gradient Policy Optimization

Authors: Feihu Huang, Shangqian Gao, Heng Huang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on multiple reinforcement learning tasks demonstrate the efﬁciency of our new algorithms.
Researcher Affiliation	Academia	Feihu Huang , Shangqian Gao , Heng Huang Department of Electrical and Computer Engineering University of Pittsburgh Pittsburgh, PA 15261, USA huangfeihu2018@gmail.com, shg84@pitt.edu, heng.huang@pitt.edu
Pseudocode	Yes	Algorithm 1 BGPO Algorithm Algorithm 2 VR-BGPO Algorithm
Open Source Code	Yes	Our code is available at https: //github.com/gaosh/BGPO.
Open Datasets	Yes	To test the effectiveness of two different Bregman divergences, we evaluate them on three classic control environments from gym Brockman et al. (2016): Cart Pole-v1, Acrobat-v1, and Mountain Car Continuous-v0. ...To evaluate the performance of these algorithms, we test them on six gym (Brockman et al., 2016) environments with continuous control tasks...
Dataset Splits	No	The paper specifies environment names, batch sizes, and number of timesteps, but it does not explicitly provide percentages or sample counts for training, validation, and test dataset splits. It refers to "gym environments" which typically have predefined splits, but doesn't state them within the paper.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only discusses software environments and general experiment setups.
Software Dependencies	No	All methods include our method, are implemented with garage (garage contributors, 2019) and pytorch (Paszke et al., 2019). The paper mentions the software packages 'garage' and 'pytorch' but does not specify their version numbers or any other software dependencies with version information.
Experiment Setup	Yes	Table 2: Setups of environments and hyper-parameters for experiments in section 6.2 and section 6.3. Table 3: Setups of environments and hyper-parameters for experiments in section 6.4. These tables detail network sizes, number of timesteps, batch sizes, and specific hyperparameters like {b, m, c} values, λ, and learning rates for different environments.