Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Transferring Reasoning Capabilities between LLMs operating via Curriculum Learning Policy

Authors: Leonardo Ranaldi, Giulia Pucci, Fabio Massimo Zanzotto

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a comprehensive analysis using four question-answering benchmarks. The results show that SMLs can be instructed to reason via Demonstrations delivered by LLMs.
Researcher Affiliation	Academia	Leonardo Ranaldi EMAIL University of Edinburgh University of Rome Tor Vergata Giulia Pucci EMAIL University of Aberdeen Fabio Massimo Zanzotto EMAIL University of Rome Tor Vergata
Pseudocode	No	The paper describes the method and steps using prose and mathematical formulas (e.g., equations 1-7) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	All code is available in the supplementary material, to be released if accepted.
Open Datasets	Yes	CSQA Talmor et al. (2019) huggingface.co/datasets/commonsense_qa OBQA Mihaylov et al. (2018) huggingface.co/datasets/openbookqa PIQA Bisk et al. (2019) huggingface.co/datasets/piqa SIQA Sap et al. (2019) huggingface.co/datasets/social_i_qa
Dataset Splits	Yes	Since a test split for all benchmarks is not always available open-source, we adopt the following strategy: we use 4000 examples with equally distributed target classes as training data and the validation versions found on huggingface as test data. In Table 9, we report the quantitative information, global, and splitting ratios.
Hardware Specification	Yes	We conducted our experiments on a workstation equipped with two Nvidia RTX A6000 with 48GB of VRAM.
Software Dependencies	Yes	Model Version Llama-3-1 meta-llama/Llama-3.2-1B-Instruct Llama-3-8 meta-llama/Meta-Llama-3-8B-Instruct Llama-3-70 meta-llama/Meta-Llama-3-70B Mistral-7 mistralai/Mistral-7B-Instruct-v0.2 GPT-4o Open AI API (gpt-4o-2024-08-06)
Experiment Setup	Yes	We follow the training approach proposed in Alpaca Taori et al. (2023) and trainoed the models for 3 epochs and set the learning rate to 0.00002 with 0.001 weight decay.