Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation

Authors: Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, Yew-Soon Ong

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real-world datasets show that our method significantly outperforms state-of-the-art methods for SBR. We conduct extensive experiments to evaluate the performance of ALKDRec and answer four research questions.
Researcher Affiliation	Academia	1College of Computing and Data Science, Nanyang Technological University, Singapore 2Information Systems Technology and Design, Singapore University of Technology and Design, Singapore 3A*STAR Center for Frontier AI Research, Singapore EMAIL,zhu EMAIL,EMAIL, EMAIL,EMAIL,EMAIL
Pseudocode	No	The paper describes the proposed method conceptually and mathematically, including definitions, theorems, and equations, but does not contain a structured pseudocode or algorithm block.
Open Source Code	Yes	1Due to space limitations, we only provide the main sketch of proofs, while the detailed proofs can be found in Appendix B of our Git Hub repository at https://github.com/kk97111/ALKDRec.
Open Datasets	Yes	Datasets. We evaluate ALKDRec and baselines on two real-world datasets, namely Hetrec2011-ML and Amazon Games.
Dataset Splits	Yes	We randomly split sessions into training, validation, and test sets by 6:2:2. For evaluation phase, we adopt the widely used leave-oneout evaluation protocol.
Hardware Specification	No	No specific hardware details for running experiments are provided. The paper mentions using the GPT-4-turbo API and associated costs ('e.g., around 44 minutes and 8.6 USD for Chat GPT API in Amazon-Games'), but not the underlying hardware specifications for the experimental environment.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'GPT-4-turbo2024-04-09' as the LLM teacher model, but does not provide specific version numbers for general software dependencies like programming languages or libraries used for implementation.
Experiment Setup	Yes	For an effective instance... we set µ = 10 empirically. For similar and incorrect instances, we assign them with lower gain values compared to effective instances, i.e., gsi s = gin s = gef s /2. We adopt the GPT-4-turbo2024-04-09 as the LLM teacher to distill knowledge from 500 instances... We set the number of effective/similar/incorrect instances as 1:5:4... For αv in Equation (1), we assign 3/2/1... we set the latent dimensions for the teacher and student recommenders at 100 and 10... We set the learning rate as 1 10 3 with Adam optimizer and batch size as 1024 for all methods.