Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

Authors: Shen Gao, Zhengliang Shi, Minghang Zhu, Bowen Fang, Xin Xin, Pengjie Ren, Zhumin Chen, Jun Ma, Zhaochun Ren

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool learning framework in real-world application scenarios compared to both tuning-free (e.g., Chat GPT, Claude) and tuning-based baselines (e.g., GPT4Tools).
Researcher Affiliation Academia Shen Gao1*, Zhengliang Shi1*, Minghang Zhu1, Bowen Fang1, Xin Xin1, Pengjie Ren1, Zhumin Chen1, Jun Ma1, Zhaochun Ren2 1Shandong University, Qingdao, China 2Leiden University, Leiden, The Netherlands
Pseudocode No The paper describes its methods verbally and through equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct link to a source code repository or an explicit statement about the release of their implementation code.
Open Datasets No The paper describes constructing its own tool-use dataset by prompting Chat GPT and manually building a seed instance pool, but it does not provide concrete access information (link, DOI, specific citation) for this dataset to be publicly available.
Dataset Splits No The paper mentions using a 'training set' and 'test sets' ('Seen' and 'Unseen' each with 2,000 instances), but it does not specify the overall dataset size or explicit percentages/counts for training, validation, and testing splits, nor does it refer to predefined splits with citations.
Hardware Specification Yes The training of our model can be done within 20 hours with 4 NVIDIA A100-PCIE-80GB GPUs.
Software Dependencies No The paper mentions 'deepspeed Ze RO strategy' and 'LLa MA-7B' as the base model, but does not provide specific version numbers for software libraries or dependencies like Python, PyTorch, or DeepSpeed itself.
Experiment Setup Yes We optimize the model using deepspeed Ze RO strategy (Rasley et al. 2020) with the learning rate of 5e 5 and the weight decay coefficient of 0.01.