Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Active Large Language Model-Based Knowledge Distillation for Session-Based Recommendation
Authors: Yingpeng Du, Zhu Sun, Ziyan Wang, Haoyan Chua, Jie Zhang, Yew-Soon Ong
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world datasets show that our method significantly outperforms state-of-the-art methods for SBR. We conduct extensive experiments to evaluate the performance of ALKDRec and answer four research questions. |
| Researcher Affiliation | Academia | 1College of Computing and Data Science, Nanyang Technological University, Singapore 2Information Systems Technology and Design, Singapore University of Technology and Design, Singapore 3A*STAR Center for Frontier AI Research, Singapore EMAIL,zhu EMAIL,EMAIL, EMAIL,EMAIL,EMAIL |
| Pseudocode | No | The paper describes the proposed method conceptually and mathematically, including definitions, theorems, and equations, but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | Yes | 1Due to space limitations, we only provide the main sketch of proofs, while the detailed proofs can be found in Appendix B of our Git Hub repository at https://github.com/kk97111/ALKDRec. |
| Open Datasets | Yes | Datasets. We evaluate ALKDRec and baselines on two real-world datasets, namely Hetrec2011-ML and Amazon Games. |
| Dataset Splits | Yes | We randomly split sessions into training, validation, and test sets by 6:2:2. For evaluation phase, we adopt the widely used leave-oneout evaluation protocol. |
| Hardware Specification | No | No specific hardware details for running experiments are provided. The paper mentions using the GPT-4-turbo API and associated costs ('e.g., around 44 minutes and 8.6 USD for Chat GPT API in Amazon-Games'), but not the underlying hardware specifications for the experimental environment. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'GPT-4-turbo2024-04-09' as the LLM teacher model, but does not provide specific version numbers for general software dependencies like programming languages or libraries used for implementation. |
| Experiment Setup | Yes | For an effective instance... we set ยต = 10 empirically. For similar and incorrect instances, we assign them with lower gain values compared to effective instances, i.e., gsi s = gin s = gef s /2. We adopt the GPT-4-turbo2024-04-09 as the LLM teacher to distill knowledge from 500 instances... We set the number of effective/similar/incorrect instances as 1:5:4... For ฮฑv in Equation (1), we assign 3/2/1... we set the latent dimensions for the teacher and student recommenders at 100 and 10... We set the learning rate as 1 10 3 with Adam optimizer and batch size as 1024 for all methods. |