Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Forecasting Electric Vehicle Charging Station Occupancy: Smarter Mobility Data Challenge

Authors: Yvenn Amara-Ouali, Yannig Goude, Nathan Doumèche, Pascal Veyret, Alexis Thomas, Daniel Hebenstreit, Thomas Wedenig, Arthur Satouf, Aymeric Jan, Yannick Deleuze, Paul Berhaut, Sebastien Treguer

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This article presents the Smarter Mobility Data Challenge, which aims at testing statistical and machine learning forecasting models to predict the states of a set of charging stations in the Paris area at different geographical resolutions. This challenge involved analysing a dataset of 91 charging stations across four geographical areas over seven months in 2020-2021. The forecasts were evaluated at three spatial levels (individual stations, areas regrouping stations by neighborhoods and the global level of all the stations in Paris), thus capturing the different spatial information relevant to the various use cases. The results uncover meaningful patterns in EV usage and highlight the potential of this dataset to accurately predict EV charging behaviors. This open dataset addresses many real-world challenges associated with time series, such as missing values, non-stationarity and spatio-temporal correlations. Access to the dataset, code and benchmarks are available at https://gitlab.com/smarter-mobility-data-challenge/tutorials to foster future research.
Researcher Affiliation	Collaboration	Yvenn Amara-Ouali EMAIL EDF R&D and Université Paris-Saclay Yannig Goude EMAIL EDF R&D and Université Paris-Saclay Nathan Doumèche EMAIL EDF R&D and Sorbonne Université Pascal Veyret EMAIL EDF R&D Alexis Thomas EMAIL Ecole des Mines de Paris Daniel Hebenstreit EMAIL Graz University of Technology Thomas Wedenig EMAIL Graz University of Technology Arthur Satouf EMAIL CY Tech Aymeric Jan EMAIL SLB, AI Lab Yannick Deleuze EMAIL Veolia S&TE Paul Berhaut EMAIL Air Liquide Sébastien Treguer EMAIL INRIA
Pseudocode	No	The paper describes methods and models in detail (e.g., Cat Boost, XGBoost, ARIMA, FCNN, GNN) but does not present any explicitly labeled pseudocode or algorithm blocks. Figures 9 and 10 show workflow diagrams, not pseudocode.
Open Source Code	Yes	Access to the dataset, code and benchmarks are available at https://gitlab.com/smarter-mobility-data-challenge/tutorials to foster future research. The full dataset, baseline models, winning solutions, and aggregations, are available at https://gitlab.com/smarter-mobility-data-challenge/tutorials and distributed under the Open Database License (ODb L). The code to reproduce these experiments is available at https://gitlab.com/smarter-mobility-data-challen ge/tutorials/-/tree/master/2.%20Model%20Benchmark.
Open Datasets	Yes	Access to the dataset, code and benchmarks are available at https://gitlab.com/smarter-mobility-data-challenge/tutorials to foster future research. An open dataset on electric vehicle behaviors gathering both spatial and hierarchical features, available at https://gitlab.com/smarter-mobility-data-challenge/a dditional_materials. The dataset is based on the real-time charging station occupancy information of the Belib network, available on the Paris Data platform (ODb L) (of Paris, 2023). Two more complete datasets using new features and spanning from July 2020 to July 2022 are available at doi.org/10.5281/zenodo.8280566 and at gitlab.com/smarter-mobility-data-challenge/additional_materials.
Dataset Splits	Yes	For this data challenge, we split the data between a training and a testing set. ... The training set contains Dtrain points from 2020-07-03 00:00 to 2021-02-18 23:45. The test set contains Dtest points from 2021-02-19 00:00 to 2021-03-10 23:45. ... To create the public and the private sets, the test set was split into three subsets of one week each. The first week was assigned to the public set, and the third one to the private set. We randomly assigned 20% of second week to the public set and the rest to the private, as illustrated in Figure 5.
Hardware Specification	No	The paper discusses various models and their training, including mentioning "fast optimization relying on parallelization" for Cat Boost, but it does not provide any specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies	No	The paper mentions several software libraries and frameworks, such as Cat Boost, skforecast, XGBoost, and optuna, and Python, but it does not specify any version numbers for these software dependencies, which would be necessary for reproducibility.
Experiment Setup	Yes	The paper provides specific experimental setup details for various models. For example, for XGBoost, it states 'an autoregressive XGBoost model... with 100 estimators... each having 4 targets, resulting in 364 models. Each model receives the last 20 target values...' For ARIMA, it specifies 'p = 2 past values... first-order differencing... (d = 1)... moving average part... of first-order (q = 1).' For Cat Boost, it mentions 'C(4, 150) and Cexp(5, 200)' where the numbers represent depth and iterations. For FCNN, it states 'one hidden layer, 155 neurons, a learning rate of 7.8e 4, a dropout of 0.012, a batch size of 480 and 14 epochs.'