Natural Language Instruction-following with Task-related Language Development and Translation

Authors: Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Xiong-Hui Chen, Yang Yu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a series of experiments to evaluate the effectiveness of TALAR and answer the following questions: (1) How does TALAR perform compared to existing NLC-RL approaches when learning an instruction-following policy? (Section 5.1) (2) Can TALAR learn effective task language? (Section 5.2) (3) Can TL acquire any compositional structure and serve as an abstraction for hierarchical RL? (Section 5.3) (4) What is the impact of each component on the overall performance of TALAR? (Section 5.4)
Researcher Affiliation Collaboration National Key Laboratory of Novel Software Technology, Nanjing University Polixir Technology {pangjc,yangxy,yangsh,chenxh}@lamda.nju.edu.cn, yuy@nju.nju.edu.cn
Pseudocode Yes Algorithm 1 Training procedure of the TL generator. Algorithm 2 Training procedure of the translator. Algorithm 3 Training procedure of the instruction-following policy.
Open Source Code No The paper mentions using "the open-sourced RL repository, stable-baselines3 [65]" for implementation but does not provide a specific link or statement for the open-sourcing of their own research code.
Open Datasets Yes We conduct experiments in Franka Kitchen [8] and CLEVR-Robot [9] environments, as shown in Fig. 3.
Dataset Splits No The paper states "We split the NL instructions into two tasks: training and the testing set" but does not explicitly mention or specify details for a validation set or precise split percentages for all datasets used for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper mentions "we utilize the open-sourced RL repository, stable-baselines3 [65]" but does not provide specific version numbers for this or other software dependencies.
Experiment Setup Yes The hyper-parameters for implementing TALAR are presented in Table 2. When implementing baseline methods, we use the same hyper-parameters of PPO for policy learning.