Position: Towards Unified Alignment Between Agents, Humans, and Environment

Authors: Zonghan Yang, An Liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile Wang, Zeyuan Yang, Qingyuan Hu, Xinrui Chen, Zhenhe Zhang, Fuwen Luo, Zhicheng Guo, Peng Li, Yang Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct proof-of-concept studies by introducing realistic features to Web Shop (Yao et al., 2022a)... We then follow the principles of UA2 to propose an initial design of our agent and benchmark its performance with several candidate baselines in the retrofitted Web Shop. The extensive experimental results further prove the importance of the principles of UA2.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China 2Institute for AI Industry Research (AIR), Tsinghua University, Beijing, China 3Jiangsu Collaborative Innovation Center for Language Competence, Jiangsu, China.
Pseudocode Yes The algorithms of collaborative filtering and DPP-based reranking are briefed in Algorithms 1 and 2, respectively.
Open Source Code Yes The code implementation of the retrofitted Web Shop environment can be found at https://github.com/AgentForceTeamOfficial/UA2-WebShop The code implementation of our agent design can be found at https://github.com/AgentForceTeamOfficial/UA2-Agent
Open Datasets Yes Web Shop is a simulated online shopping en-vironment with 1.18M real-world shopping items gathered from Amazon, and 12,087 textual shopping instructions collected from human annotators.
Dataset Splits Yes We evaluate our method and baseline methods across all 10 users on our retrofitted Webshop, each comprising 50 tasks, except for LATS which is evaluated with only one user due to its high cost. All methods were tested in each of the following three environments respectively: The fully retrofitted environment... The ablated environment that excludes human intentions... The ablated environment that excludes environmental dynamics...
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided.
Software Dependencies Yes We employed Chat GPT (gpt-3.5-turbo-1106) as an assistant to simulate 30 different users and gather their preference data... All methods utilize gpt-3.5-turbo-instruct-0914 as the underlying model for their agents except for LATS where we utilize gpt-3.5-turbo-1106 to keep the same setting as the original paper.
Experiment Setup Yes In executing each task, we limited the interaction with the web to a maximum of 15 steps per task, inclusive of any invalid actions. For Re Act, Reflexion, and our method, we set the temperature as 0.0. For Re Act-SC, we set the number of samples k to be 3 and the temperature to be 0.05... For Co T and Co T-L2M, we also set the temperature as 0.0; and for Co T-SC, we also set k = 3 and the temperature to be 0.05. To adhere to the same settings with (Zhou et al., 2023a), we set the temperature to be 1.0, k to be 5, the number of iterations n to be 30 for LATS.