Simple Emergent Action Representations from Multi-Task Policy Training
Authors: Pu Hua, Yubei Chen, Huazhe Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that the proposed action representations are effective for intra-action interpolation and inter-action composition with limited or no additional learning. |
| Researcher Affiliation | Collaboration | Pu Hua1,4, Yubei Chen 2, Huazhe Xu 1,3,4 1Tsinghua University, 2Center for Data Science, New York University, 3Shanghai AI Lab, 4Shanghai Qi Zhi Institute |
| Pseudocode | Yes | Algorithm 1 Multi-task Training |
| Open Source Code | No | Project page: https://sites.google.com/view/emergent-action-representation/ and Animated results are shown in the project page: https: //sites.google.com/view/emergent-action-representation/. (The project page is a demo/results page, not explicitly a source code repository, and the paper does not explicitly state code availability there.) |
| Open Datasets | Yes | We evaluate our method on five locomotion control environments (Half Cheetah Vel, Ant-Dir, Hopper-Vel, Walker-Vel, Half Cheetah-Run-Jump) based on Open AI Gym and the Mujoco simulator. |
| Dataset Splits | Yes | Half Cheetah-Vel (Uni-modal): In this environment, we train the halfcheetah agent to run at a target velocity. The training task set contains 10 velocities. The target velocities during training range from 1 m/s to 10 m/s. For every 1 m/s, we set a training task. The adaptation task set contains 3 velocities that are uniformly sampled from [1,10]. and B.3.3 TASK SAMPLING DENSITY In the experiments, we fix the range of task sampling to be for different algorithms in the same environment... The detailed settings of the implementation in our paper are demonstrated in Table 3. |
| Hardware Specification | No | The paper mentions using Open AI Gym and the Mujoco simulator for environments but does not specify any hardware details such as GPU/CPU models, memory, or cloud resources used for experiments. |
| Software Dependencies | No | The paper refers to environments like Open AI Gym and the Mujoco simulator, and methods like Soft Actor-Critic, but it does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | In this section, we provide detailed settings of our methods. We set up the hyperparameters, as shown in Table 2, for the environments and algorithms in the Mujoco locomotion benchmarks. |