Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
Authors: Shenao Zhang, Wanxin Jin, Zhaoran Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on various robotic tasks are provided to support our theory and method. |
| Researcher Affiliation | Academia | 1Northwestern University, Evanston, IL, USA 2Arizona State University, Tempe, AZ, USA. |
| Pseudocode | Yes | We provide the pseudocode of Adaptive Barrier Smoothing in Algorithm 1. By adopting ABS to compute the policy gradient in (4.1) and following Algorithm 2 as the main training loop, we obtain the FOPG-ABS method. |
| Open Source Code | No | The paper does not contain any explicit statement about making its source code open, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions running experiments in the 'Dojo (Howell et al., 2022) physics engine' and discusses 'Dojo locomotion tasks'. It also refers to collecting an 'evaluation transition dataset' in Section 8.4. However, it does not provide concrete access information (e.g., a direct link, DOI, or specific citation to a public dataset with access details) for any dataset used for training or evaluation. |
| Dataset Splits | No | The paper does not specify exact training, validation, or test dataset splits (e.g., percentages or counts). It mentions using an 'evaluation transition dataset' but no specific split details are provided. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Dojo (Howell et al., 2022) physics engine' and implies the use of 'Python' in Appendix C.1. However, it does not provide specific version numbers for Dojo, Python, or any other critical software libraries or dependencies, which would be necessary for reproduction. |
| Experiment Setup | Yes | In our Dojo experiments in Section 8.3, we use a contact-aware central-path parameter for the proposed Adaptive Barrier Smoothing method. From the results in Figure 3, to balance the gradient variance and bias, we would like µ 0 when all impact contacts are active or the distance-to-obstacle is large, and µ 10 2 when this distance approaches zero. To accomplish this, the adaptive µ(xt, ut) is designed as µ(xt, ut) = 10 2 100d2 + 1 4 = 10 2 100 min i I |ϕ(xt, ut)(i)|2 + 1 4. (C.2) |