Natural Language Instruction-following with Task-related Language Development and Translation
Authors: Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Xiong-Hui Chen, Yang Yu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments to evaluate the effectiveness of TALAR and answer the following questions: (1) How does TALAR perform compared to existing NLC-RL approaches when learning an instruction-following policy? (Section 5.1) (2) Can TALAR learn effective task language? (Section 5.2) (3) Can TL acquire any compositional structure and serve as an abstraction for hierarchical RL? (Section 5.3) (4) What is the impact of each component on the overall performance of TALAR? (Section 5.4) |
| Researcher Affiliation | Collaboration | National Key Laboratory of Novel Software Technology, Nanjing University Polixir Technology {pangjc,yangxy,yangsh,chenxh}@lamda.nju.edu.cn, yuy@nju.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training procedure of the TL generator. Algorithm 2 Training procedure of the translator. Algorithm 3 Training procedure of the instruction-following policy. |
| Open Source Code | No | The paper mentions using "the open-sourced RL repository, stable-baselines3 [65]" for implementation but does not provide a specific link or statement for the open-sourcing of their own research code. |
| Open Datasets | Yes | We conduct experiments in Franka Kitchen [8] and CLEVR-Robot [9] environments, as shown in Fig. 3. |
| Dataset Splits | No | The paper states "We split the NL instructions into two tasks: training and the testing set" but does not explicitly mention or specify details for a validation set or precise split percentages for all datasets used for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions "we utilize the open-sourced RL repository, stable-baselines3 [65]" but does not provide specific version numbers for this or other software dependencies. |
| Experiment Setup | Yes | The hyper-parameters for implementing TALAR are presented in Table 2. When implementing baseline methods, we use the same hyper-parameters of PPO for policy learning. |