Get Rid of Isolation: A Continuous Multi-task Spatio-Temporal Learning Framework

Authors: Zhongchao Yi, Zhengyang Zhou, Qihe Huang, Yanjiang Chen, Liheng Yu, Xu Wang, Yang Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further establish a benchmark of three cities for multi-task spatiotemporal learning, and empirically demonstrate the superiority of CMu ST via extensive evaluations on these datasets.
Researcher Affiliation Academia Zhongchao Yi1, Zhengyang Zhou1,2,3, , Qihe Huang1, Yanjiang Chen1, Liheng Yu1, Xu Wang1,2, Yang Wang1,2, 1University of Science and Technology of China (USTC), Hefei, China 2Suzhou Institute for Advanced Research, USTC, Suzhou, China 3State Key Laboratory of Resources and Environmental Information System, Beijing, China
Pseudocode Yes Algorithm 1 Rolling Adaptation Process
Open Source Code Yes Code is available at https://github.com/DILab-USTCSZ/CMu ST.
Open Datasets Yes NYC3: Includes three months of crowd flow and taxi hailing from Manhattan and its surrounding areas in New York City, encompassing four tasks: Crowd In, Crowd Out, Taxi Pick, and Taxi Drop. 3https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page ... Chicago4: Comprises of traffic data collected in the second half of 2023 from Chicago, including three tasks: Taxi Pick, Taxi Drop, and Risk. 4https://data.cityofchicago.org/browse
Dataset Splits Yes We partitioned datasets into training, validation, and testing sets with 7:1:2 ratio.
Hardware Specification Yes Our model was implemented with Py Torch on a Linux system equipped with Tesla V100 16GB.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework but does not specify its version or any other software dependencies with their respective version numbers.
Experiment Setup Yes For the MSTI, embedding dimensions were dobs = 24, ds = 12, dt = 60, and the prompt dimension was 72. Dimensions for self-attention and cross-attention respectively were 168 and 24, with each attention having 4 heads and FFN s hidden dimension was 256. The Adam optimizer is adopted with an initialized learning rate of 1 10 3, and weight decay of 3 10 4, where the early-stop was applied. For Ro Ada, the threshold δ = 10 6.