Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval
Authors: Qi Yan, Raihan Seraj, Jiawei He, Lili Meng, Tristan Sylvain
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results underscore marked improvements across multiple metrics, improving the performance for multiplechoice questions (MCQ) by 48% and true/false (TF) questions by up to 8%. |
| Researcher Affiliation | Collaboration | Qi Yan1 Raihan Seraj2 Jiawei He2 Lili Meng3 Tristan Sylvain2 1University of British Columbia 2Borealis AI 3Independent Researcher |
| Pseudocode | No | The paper describes the architecture and components of the model but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Borealis AI/Autocast-plus-plus. |
| Open Datasets | Yes | We assess our model on the Autocast dataset (Zou et al., 2022) |
| Dataset Splits | No | Table 1 provides 'Question Type Train Test Total' for the Autocast dataset, and the text mentions 'The dataset is partitioned with a cut-off in mid-2021 and questions in the test set span from mid-2021 to mid-2022.' While hyper-parameter optimization is mentioned, the specific details of a validation split are not provided. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using pre-trained models like GPT-3 and T5, and techniques like LoRA, but it does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, CUDA versions) required for reproducibility. |
| Experiment Setup | Yes | We initially retrieve K = 50 news articles using BM25 and proceed with our re-ranking process to select N = 10 unless otherwise specified. The reweighting coefficient λ in Eq. (6) is fixed at 0.1. |