BlockEcho: Retaining Long-Range Dependencies for Imputing Block-Wise Missing Data
Authors: Qiao Han, Mingqian Li, Yao Yang, Yiteng Zhai
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we rigorously test the performance of our data imputation method, Block Echo, through a series of four care-fully designed experiments using various real-world datasets available in the public domain. The initial set of experiments compare Block Echo with state-of-the-art baselines, focusing on a fixed missing rate as high as 60%. Following this, further experiments are conducted to assess the stability of these models as the missing rate is incrementally increased. Additionally, an ablation study is performed to examine the impact of specific components of the method. Finally, we designed a downstream prediction task using the imputed data as input, to evaluate how the performance of the data imputation method influences the end-to-end performance of a realworld prediction task. |
| Researcher Affiliation | Collaboration | Qiao Han1 , Mingqian Li1 , Yao Yang1 and Yiteng Zhai1,2 1Zhejiang Lab 2Nanyang Technological University |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the release of their source code. |
| Open Datasets | Yes | 1. Traffic Datasets (Time-Street matrix). We employ two traffic flow datasets, ME-LA and PE-BAY, which collected in the highway of Los Angeles County and from California Transportation Agencies Performance Measurement System[Li et al., 2017]. 2. COVID-19 Dataset (Date-City matrix). This government dataset records daily COVID-19 cases (Cov-ca) and deaths (Cov-de) in major cities around the world since 2020. 3. Movie Dataset (User-Movie matrix). This dataset (Movie) records movie ratings collected from the Movie Lens website[Harper and Konstan, 2015], which contains a large amount of missing data. |
| Dataset Splits | No | The paper describes masking data at various rates for evaluation but does not specify explicit train/validation/test dataset splits or a cross-validation setup for model training. |
| Hardware Specification | Yes | All models were trained on an Nvidia Tesla V100S PCIE GPU and each experiment is repeated for ten times with different random seeds, and the results are averaged. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | No | The paper describes experimental conditions like missing rates (e.g., "synthetically mask 60%", "dynamically adjust the data missing rate from 20% to 80%") and mentions a hyperparameter `h` in Section 4.1 without providing its value or specific system-level training configurations for Block Echo. |