Simple Emergent Action Representations from Multi-Task Policy Training

Authors: Pu Hua, Yubei Chen, Huazhe Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that the proposed action representations are effective for intra-action interpolation and inter-action composition with limited or no additional learning.
Researcher Affiliation Collaboration Pu Hua1,4, Yubei Chen 2, Huazhe Xu 1,3,4 1Tsinghua University, 2Center for Data Science, New York University, 3Shanghai AI Lab, 4Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 Multi-task Training
Open Source Code No Project page: https://sites.google.com/view/emergent-action-representation/ and Animated results are shown in the project page: https: //sites.google.com/view/emergent-action-representation/. (The project page is a demo/results page, not explicitly a source code repository, and the paper does not explicitly state code availability there.)
Open Datasets Yes We evaluate our method on five locomotion control environments (Half Cheetah Vel, Ant-Dir, Hopper-Vel, Walker-Vel, Half Cheetah-Run-Jump) based on Open AI Gym and the Mujoco simulator.
Dataset Splits Yes Half Cheetah-Vel (Uni-modal): In this environment, we train the halfcheetah agent to run at a target velocity. The training task set contains 10 velocities. The target velocities during training range from 1 m/s to 10 m/s. For every 1 m/s, we set a training task. The adaptation task set contains 3 velocities that are uniformly sampled from [1,10]. and B.3.3 TASK SAMPLING DENSITY In the experiments, we fix the range of task sampling to be for different algorithms in the same environment... The detailed settings of the implementation in our paper are demonstrated in Table 3.
Hardware Specification No The paper mentions using Open AI Gym and the Mujoco simulator for environments but does not specify any hardware details such as GPU/CPU models, memory, or cloud resources used for experiments.
Software Dependencies No The paper refers to environments like Open AI Gym and the Mujoco simulator, and methods like Soft Actor-Critic, but it does not list specific software dependencies with version numbers.
Experiment Setup Yes In this section, we provide detailed settings of our methods. We set up the hyperparameters, as shown in Table 2, for the environments and algorithms in the Mujoco locomotion benchmarks.