reproducibilityindex.ai

AED: Adaptable Error Detection for Few-shot Imitation Policy

Authors: Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston Hsu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct thorough experiments on the proposed benchmark. Even when faced with various policy behaviors and different characteristics of strong baselines, our Pr Obe still achieved the highest Top 1 counts, average ranking, and average performance difference (with a maximum difference of up to 40%), demonstrating its superiority. Additionally, we conducted an extensive ablative study to justify the effectiveness of our design choices. Furthermore, we reported additional experimental results covering timing accuracy, embedding visualization, demonstration quality, viewpoint changes, and error correction to validate our claims and the practicality of the AED task, respectively.
Researcher Affiliation	Collaboration	1National Taiwan University 2University of California, Berkeley 3National Yang Ming Chiao Tung University 4Mobile Drive
Pseudocode	No	The paper describes its methods verbally and with architectural diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The AED project page can be found at https://aed-neurips.github.io/. (On this page, it states 'Code and Data will be released soon.') Also, in the NeurIPS Paper Checklist, section 'Open access to data and code', it states 'We will release the code and data once everything is ready.'
Open Datasets	No	To this end, we develop a cross-domain AED benchmark, consisting of 322 base and 153 novel environments. Additionally, we propose Pattern Observer (Pr Obe) to address these challenges. (From Abstract) And from NeurIPS Paper Checklist - New Assets: 'We will document our proposed benchmark to make it easy for users to extend and leverage. Once our work is accepted, we will open-source it immediately.'
Dataset Splits	No	The paper describes training on 'base environments' and evaluating on 'novel environments' but does not provide explicit training/validation/test splits with percentages or sample counts, nor does it refer to predefined standard splits for its custom benchmark.
Hardware Specification	Yes	All policy experiments are conducted on a Ubuntu 20.04 machine equipped with an Intel i9-9900K CPU, 64GB RAM, and a NVIDIA RTX 3090 24GB GPU.
Software Dependencies	No	The paper mentions 'Coppeliasim [13]' and 'Pyrep [14]' as the development environment but does not specify their version numbers or the versions of other software dependencies like programming languages or libraries (e.g., Python, PyTorch/TensorFlow).
Experiment Setup	Yes	To optimize the policies, we utilize a RMSProp optimizer [52] with a learning rate of 1e-4 and a L2 regularizer with a weight of 1e-2. Each training epoch involves iterating through all base environments. Within each iteration, we sample five demonstrations and ten rollouts from the sampled base environment. The total number of epochs varies depending on the specific tasks and is specified in Table 5.