Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning
Authors: Bo Liu, Yihao Feng, Qiang Liu, Peter Stone
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency.Experimental Results Experiments are designed to validate two hypotheses: 1) MRN achieves better sample efficiency compared to the baseline methods (Sec. ), and 2) dsym and dasym are both important in the design of MRN. |
| Researcher Affiliation | Collaboration | Bo Liu1, Yihao Feng2, Qiang Liu1, Peter Stone1,3 1The University of Texas at Austin 2Salesforce Research 3Sony AI {bliu, yihao, lqiang, pstone}@cs.utexas.edu |
| Pseudocode | Yes | For the convenience of understanding and implementation of MRN, we provide the forward pass of MRN in Py Torch-like pseudocode in Alg. 1 in the Appendix. |
| Open Source Code | Yes | The code is available at https://github.com/Cranial-XIX/metricresidual-network. |
| Open Datasets | Yes | Benchmarks We use the standard GCRL benchmarks (Plappert et al. 2018) including all manipulation tasks on the Fetch robot and Shadow-hand (See Fig. 4). |
| Dataset Splits | No | The paper describes the number of training episodes per epoch (1000) and evaluation rollouts (100), but does not specify fixed train/validation/test dataset splits like percentages or sample counts for a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions using DDPG with HER and PyTorch-like pseudocode but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The monolithic network is a three-hidden-layer multi-layer perception (MLP) with 256 neurons per layer with Re LU activation (e.g. [linear-relu] 3 + linear). BVN has two separate networks f and ϕ, each of which is a three-layer MLP with 176 neurons per layer. For all other networks, we first define two encoders e1 and e2 (e.g. [linear-relu] 2). ... For MRN, both the metric part dsym and the residual asymmetric part dasym are a single hidden layer neural network with 176 neurons (e.g., linear-relu-linear). The actor network is the same as the monolithic critic network except that the output layer projects to the action space. ... For each architecture and each environment, we evaluate with 5 independent seeds {100, 200, 300, 400, 500}. The agent is trained on 1000 episodes of data each epoch. |