reproducibilityindex.ai

AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning

Authors: Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Auto Manual signiﬁcantly improves task success rates, achieving 97.4% with GPT-4-turbo and 86.2% with GPT-3.5-turbo on ALFWorld benchmark tasks.
Researcher Affiliation	Collaboration	Minghao Chen1, Yihang Li2, Yanting Yang3, Shiyu Yu5, Binbin Lin3,4 , Xiaofei He2 1School of Computer Science, Hangzhou Dianzi University 2State Key Lab of CAD&CG, Zhejiang University 3School of Software Technology, Zhejiang University 4Fullong Inc. 5Ning Bo Port Group
Pseudocode	Yes	Here is a code of my plan: python # Define a helper method to search receptacles for the target object def find_object(agent, recep_to_check, object_name): for receptacle in recep_to_check: observation = agent.go_to(receptacle) # Check if we need to open the receptacle. If we do, open it. if closed in observation: observation = agent.open(receptacle) # Check if the object is in/on the receptacle. if object_name in observation: object_ids = get_object_with_id(observation, object_name) return object_ids, receptacle return None, None # [Step 1] Get a sorted list of receptacles and surfaces to check for a spraybottle. And use find_object method to search recep_to_check = [ cabinet_1 , cabinet_2 , cabinet_3 , cabinet_4 , countertop_1 , toilet_1 , toilet_2 , sinkbasin_1 , sinkbasin_2 , garbagecan_1 ] object_ids, receptacle_with_spraybottle = find_object(agent, recep_to_check, spraybottle ) assert object_ids is not None, f Error in [Step 1]: There is no spraybottle in/on { recep_to_check}. # [Step 2] Take the spraybottle found_spraybottle = object_ids[0] observation = agent.take_from(found_spraybottle, receptacle_with_spraybottle) assert agent.holding == found_spraybottle, f Error in [Step 2]: I cannot take { found_spraybottle} from {receptacle}. # [Step 3] Go to a toilet and put the spraybottle on it observation = agent.go_to( toilet_1 ) # check if toilet_1 is closed. If so, open it. if closed in observation: observation = agent.open( toilet_1 ) observation = agent.put_in_or_on(found_spraybottle, toilet_1 )
Open Source Code	Yes	The code is available at https://github.com/minghchen/automanual.
Open Datasets	Yes	(1) ALFWorld [17] is a text-based virtual household environment containing six distinct task types. We run the building stage on 36 tasks (6 tasks for each task type) sampled from the training set of ALFWorld, and each task is run only once.
Dataset Splits	Yes	Following previous works [16, 21, 33], we run the testing stage on the validation unseen set containing 134 tasks across these six types.
Hardware Specification	No	The paper mentions using GPT-4-turbo and GPT-3.5-turbo models, and notes "The API call cost for building and formulating stages is about $14 in total." However, it does not specify the underlying hardware (e.g., CPU, GPU models, memory) used to run these models or the experiments.
Software Dependencies	No	The paper mentions using "Open AI s Assistant API" and specific GPT models like "GPT-4-turbo (gpt-4-1106-preview)" and "GPT-3.5-turbo (gpt-3.5-turbo-1106)". While these have version indicators, they refer to API services rather than specific software libraries or programming language versions (e.g., Python, PyTorch) installed with explicit version numbers.
Experiment Setup	Yes	In the building stage of all experiments, the maximum number of rules was set to 12 to balance the context length and rule diversity. We set all LLMs temperatures to 0 and maximum context length to 16000. For Ada Planner and Auto Manual, we allow the Planner agent to replan at most 3 times on ALFWorld and 6 times on Mini Wob++ in response to the environmental feedback.