Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Transferring Reasoning Capabilities between LLMs operating via Curriculum Learning Policy
Authors: Leonardo Ranaldi, Giulia Pucci, Fabio Massimo Zanzotto
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a comprehensive analysis using four question-answering benchmarks. The results show that SMLs can be instructed to reason via Demonstrations delivered by LLMs. |
| Researcher Affiliation | Academia | Leonardo Ranaldi EMAIL University of Edinburgh University of Rome Tor Vergata Giulia Pucci EMAIL University of Aberdeen Fabio Massimo Zanzotto EMAIL University of Rome Tor Vergata |
| Pseudocode | No | The paper describes the method and steps using prose and mathematical formulas (e.g., equations 1-7) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | All code is available in the supplementary material, to be released if accepted. |
| Open Datasets | Yes | CSQA Talmor et al. (2019) huggingface.co/datasets/commonsense_qa OBQA Mihaylov et al. (2018) huggingface.co/datasets/openbookqa PIQA Bisk et al. (2019) huggingface.co/datasets/piqa SIQA Sap et al. (2019) huggingface.co/datasets/social_i_qa |
| Dataset Splits | Yes | Since a test split for all benchmarks is not always available open-source, we adopt the following strategy: we use 4000 examples with equally distributed target classes as training data and the validation versions found on huggingface as test data. In Table 9, we report the quantitative information, global, and splitting ratios. |
| Hardware Specification | Yes | We conducted our experiments on a workstation equipped with two Nvidia RTX A6000 with 48GB of VRAM. |
| Software Dependencies | Yes | Model Version Llama-3-1 meta-llama/Llama-3.2-1B-Instruct Llama-3-8 meta-llama/Meta-Llama-3-8B-Instruct Llama-3-70 meta-llama/Meta-Llama-3-70B Mistral-7 mistralai/Mistral-7B-Instruct-v0.2 GPT-4o Open AI API (gpt-4o-2024-08-06) |
| Experiment Setup | Yes | We follow the training approach proposed in Alpaca Taori et al. (2023) and trainoed the models for 3 epochs and set the learning rate to 0.00002 with 0.001 weight decay. |