Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data

Authors: Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai

IJCAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we rigorously test the performance of our data imputation method, Block Echo, through a series of four care-fully designed experiments using various real-world datasets available in the public domain. The initial set of experiments compare Block Echo with state-of-the-art baselines, focusing on a fixed missing rate as high as 60%. Following this, further experiments are conducted to assess the stability of these models as the missing rate is incrementally increased. Additionally, an ablation study is performed to examine the impact of specific components of the method. Finally, we designed a downstream prediction task using the imputed data as input, to evaluate how the performance of the data imputation method influences the end-to-end performance of a realworld prediction task.
Researcher Affiliation	Collaboration	Qiao Han1 , Mingqian Li1 , Yao Yang1 and Yiteng Zhai1,2 1Zhejiang Lab 2Nanyang Technological University
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the release of their source code.
Open Datasets	Yes	1. Traffic Datasets (Time-Street matrix). We employ two traffic flow datasets, ME-LA and PE-BAY, which collected in the highway of Los Angeles County and from California Transportation Agencies Performance Measurement System[Li et al., 2017]. 2. COVID-19 Dataset (Date-City matrix). This government dataset records daily COVID-19 cases (Cov-ca) and deaths (Cov-de) in major cities around the world since 2020. 3. Movie Dataset (User-Movie matrix). This dataset (Movie) records movie ratings collected from the Movie Lens website[Harper and Konstan, 2015], which contains a large amount of missing data.
Dataset Splits	No	The paper describes masking data at various rates for evaluation but does not specify explicit train/validation/test dataset splits or a cross-validation setup for model training.
Hardware Specification	Yes	All models were trained on an Nvidia Tesla V100S PCIE GPU and each experiment is repeated for ten times with different random seeds, and the results are averaged.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies.
Experiment Setup	No	The paper describes experimental conditions like missing rates (e.g., "synthetically mask 60%", "dynamically adjust the data missing rate from 20% to 80%") and mentions a hyperparameter `h` in Section 4.1 without providing its value or specific system-level training configurations for Block Echo.