Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transferring Reasoning Capabilities between LLMs operating via Curriculum Learning Policy

Authors: Leonardo Ranaldi, Giulia Pucci, Fabio Massimo Zanzotto

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a comprehensive analysis using four question-answering benchmarks. The results show that SMLs can be instructed to reason via Demonstrations delivered by LLMs.
Researcher Affiliation Academia Leonardo Ranaldi EMAIL University of Edinburgh University of Rome Tor Vergata Giulia Pucci EMAIL University of Aberdeen Fabio Massimo Zanzotto EMAIL University of Rome Tor Vergata
Pseudocode No The paper describes the method and steps using prose and mathematical formulas (e.g., equations 1-7) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No All code is available in the supplementary material, to be released if accepted.
Open Datasets Yes CSQA Talmor et al. (2019) huggingface.co/datasets/commonsense_qa OBQA Mihaylov et al. (2018) huggingface.co/datasets/openbookqa PIQA Bisk et al. (2019) huggingface.co/datasets/piqa SIQA Sap et al. (2019) huggingface.co/datasets/social_i_qa
Dataset Splits Yes Since a test split for all benchmarks is not always available open-source, we adopt the following strategy: we use 4000 examples with equally distributed target classes as training data and the validation versions found on huggingface as test data. In Table 9, we report the quantitative information, global, and splitting ratios.
Hardware Specification Yes We conducted our experiments on a workstation equipped with two Nvidia RTX A6000 with 48GB of VRAM.
Software Dependencies Yes Model Version Llama-3-1 meta-llama/Llama-3.2-1B-Instruct Llama-3-8 meta-llama/Meta-Llama-3-8B-Instruct Llama-3-70 meta-llama/Meta-Llama-3-70B Mistral-7 mistralai/Mistral-7B-Instruct-v0.2 GPT-4o Open AI API (gpt-4o-2024-08-06)
Experiment Setup Yes We follow the training approach proposed in Alpaca Taori et al. (2023) and trainoed the models for 3 epochs and set the learning rate to 0.00002 with 0.001 weight decay.