Bregman Gradient Policy Optimization
Authors: Feihu Huang, Shangqian Gao, Heng Huang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms. |
| Researcher Affiliation | Academia | Feihu Huang , Shangqian Gao , Heng Huang Department of Electrical and Computer Engineering University of Pittsburgh Pittsburgh, PA 15261, USA huangfeihu2018@gmail.com, shg84@pitt.edu, heng.huang@pitt.edu |
| Pseudocode | Yes | Algorithm 1 BGPO Algorithm Algorithm 2 VR-BGPO Algorithm |
| Open Source Code | Yes | Our code is available at https: //github.com/gaosh/BGPO. |
| Open Datasets | Yes | To test the effectiveness of two different Bregman divergences, we evaluate them on three classic control environments from gym Brockman et al. (2016): Cart Pole-v1, Acrobat-v1, and Mountain Car Continuous-v0. ...To evaluate the performance of these algorithms, we test them on six gym (Brockman et al., 2016) environments with continuous control tasks... |
| Dataset Splits | No | The paper specifies environment names, batch sizes, and number of timesteps, but it does not explicitly provide percentages or sample counts for training, validation, and test dataset splits. It refers to "gym environments" which typically have predefined splits, but doesn't state them within the paper. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only discusses software environments and general experiment setups. |
| Software Dependencies | No | All methods include our method, are implemented with garage (garage contributors, 2019) and pytorch (Paszke et al., 2019). The paper mentions the software packages 'garage' and 'pytorch' but does not specify their version numbers or any other software dependencies with version information. |
| Experiment Setup | Yes | Table 2: Setups of environments and hyper-parameters for experiments in section 6.2 and section 6.3. Table 3: Setups of environments and hyper-parameters for experiments in section 6.4. These tables detail network sizes, number of timesteps, batch sizes, and specific hyperparameters like {b, m, c} values, λ, and learning rates for different environments. |