PowerPM: Foundation Model for Power Systems

Authors: Shihao Tu, Yupeng Zhang, Jing Zhang, Zhendong Fu, Yin Zhang, YANG YANG

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments span five real-world scenario datasets, including both private and public data. Through pre-training on massive ETS data, Power PM achieves SOTA performance on diverse downstream tasks within the private dataset. Notably, when transferred to public datasets, Power PM retains its edge, showcasing its remarkable generalization ability across various tasks and domains. Moreover, ablation studies and few-shot experiments further substantiate the effectiveness of our model.
Researcher Affiliation Academia Shihao Tu Zhejiang University shihao.tu@zju.edu.cn Yupeng Zhang Zhejiang University yuppzhang@zju.edu.cn Jing Zhang Renmin University of China zhang-jing@ruc.edu.cn Zhendong Fu Zhejiang University zhendongfu@zju.edu.cn Yin Zhang Zhejiang University yinzh@zju.edu.cn Yang Yang Zhejiang University yangya@zju.edu.cn
Pseudocode No The paper describes its methodology in narrative text and illustrative figures (e.g., Figure 3: 'The pre-training framework of Power PM'), but it does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Also, Power PM is an off-the-shelf model with its code and weights. [...] Our code is provided as a supplement.
Open Datasets Yes The other four are collected from CSISO 6, ISONE7, NYISO 8, and PJM 9. [Footnotes provide URLs: 6http://www.energyonline.com/Data/, 7https://www.iso-ne.com/isoexpress/web/reports/load-and-demand/, 8https://www.nyiso.com/load-data, 9https://dataminer2.pjm.com/list]
Dataset Splits Yes These downstream datasets are partitioned into train, validation, and test sets according to a 6 : 2 : 2 ratio, ensuring that the training set contain data from the earlier time period.
Hardware Specification Yes The pre-training stage of the experiment is implemented in Py Torch [24] and conducted on a Linux system with 2 CPUs (AMD EPYC 9654 96-Core Processor) and 8 GPUs (NVIDIA Tesla A800 80G) for about 8 days.
Software Dependencies No The paper mentions software like PyTorch, Sklearn, and GPT-2/Llama-7b models, but does not explicitly provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'Scikit-learn 1.0').
Experiment Setup Yes For the model configurations, the temporal encoder contains a 26-layer Transformer encoder with model dimension 1024, inner dimension (FFN) 2048 and 16 attention heads, and the hierarchical encoder contains 2-layer R-GCN. Power PM contains about 250M parameters. During pre-training, the 40% segments in each input window are masked in the form of random mask and casual mask, the user cluster numbers is set to 12. [...] We select 512 samples as a batch, and every batch contains about 174k patches, which we set patch len to 48 , stride to 24. [...] We optimize with Adam [18], updating the model parameters every 4 steps, and the model trains for 1310k updates in total. A reduce learning rate on plateau scheduler is utilized to adjust learning rate during pre-training. Specifically, we set the basic learning rate as 1e 6 and the maximum learning rate as 2e 5, and the learning rate updates for every 10k updates. [Also refers to Table 5 for more details].