Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Egocentric Planning for Scalable Embodied Task Achievement
Authors: Xiatoian Liu, Hector Palacios, Christian Muise
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our approach in ALFRED, a simulated environment designed for domestic tasks, and demonstrated its high scalability, achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and winning the ALFRED challenge at CVPR Embodied AI workshop. |
| Researcher Affiliation | Collaboration | Xiaotian Liu Service Now Research Montreal, QC, Canada xiaotian.liu @mail.utoronto.ca Hector Palacios Service Now Research Montreal, QC, Canada hectorpal @gmail.com Christian Muise Queen s University Kingston, ON, Canada christian.muise @queensu.ca |
| Pseudocode | Yes | Algorithm 1 Iterative Exploration Replanning (IER) |
| Open Source Code | No | The paper mentions using a pre-trained model provided by FILM and converting its template-based result, but does not provide an explicit statement or link to their own open-source code for the methodology described. |
| Open Datasets | Yes | We evaluated our approach in ALFRED, a simulated environment designed for domestic tasks, and demonstrated its high scalability, achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and winning the ALFRED challenge at CVPR Embodied AI workshop. |
| Dataset Splits | Yes | The ALFRED dataset contains a validation dataset which is split into 820 Validation Seen episodes and 821 Validation Unseen episodes. |
| Hardware Specification | Yes | Perception and Language module was fine-tuned on a Nvidia 3080. |
| Software Dependencies | No | The paper mentions models like U-Net and Mask-RCNN and using pre-trained FILM models, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The random exploration step is used to generate diverse set of object cluster for further exploration using our planner. Subsequently, at t = 500, the gathered information from the semantic spatial graph is converted into a PDDL problem for the agent. ... we allow our agent to first conduct 500 random exploration movements. |