PowerPM: Foundation Model for Power Systems
Authors: Shihao Tu, Yupeng Zhang, Jing Zhang, Zhendong Fu, Yin Zhang, YANG YANG
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments span five real-world scenario datasets, including both private and public data. Through pre-training on massive ETS data, Power PM achieves SOTA performance on diverse downstream tasks within the private dataset. Notably, when transferred to public datasets, Power PM retains its edge, showcasing its remarkable generalization ability across various tasks and domains. Moreover, ablation studies and few-shot experiments further substantiate the effectiveness of our model. |
| Researcher Affiliation | Academia | Shihao Tu Zhejiang University shihao.tu@zju.edu.cn Yupeng Zhang Zhejiang University yuppzhang@zju.edu.cn Jing Zhang Renmin University of China zhang-jing@ruc.edu.cn Zhendong Fu Zhejiang University zhendongfu@zju.edu.cn Yin Zhang Zhejiang University yinzh@zju.edu.cn Yang Yang Zhejiang University yangya@zju.edu.cn |
| Pseudocode | No | The paper describes its methodology in narrative text and illustrative figures (e.g., Figure 3: 'The pre-training framework of Power PM'), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Also, Power PM is an off-the-shelf model with its code and weights. [...] Our code is provided as a supplement. |
| Open Datasets | Yes | The other four are collected from CSISO 6, ISONE7, NYISO 8, and PJM 9. [Footnotes provide URLs: 6http://www.energyonline.com/Data/, 7https://www.iso-ne.com/isoexpress/web/reports/load-and-demand/, 8https://www.nyiso.com/load-data, 9https://dataminer2.pjm.com/list] |
| Dataset Splits | Yes | These downstream datasets are partitioned into train, validation, and test sets according to a 6 : 2 : 2 ratio, ensuring that the training set contain data from the earlier time period. |
| Hardware Specification | Yes | The pre-training stage of the experiment is implemented in Py Torch [24] and conducted on a Linux system with 2 CPUs (AMD EPYC 9654 96-Core Processor) and 8 GPUs (NVIDIA Tesla A800 80G) for about 8 days. |
| Software Dependencies | No | The paper mentions software like PyTorch, Sklearn, and GPT-2/Llama-7b models, but does not explicitly provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'Scikit-learn 1.0'). |
| Experiment Setup | Yes | For the model configurations, the temporal encoder contains a 26-layer Transformer encoder with model dimension 1024, inner dimension (FFN) 2048 and 16 attention heads, and the hierarchical encoder contains 2-layer R-GCN. Power PM contains about 250M parameters. During pre-training, the 40% segments in each input window are masked in the form of random mask and casual mask, the user cluster numbers is set to 12. [...] We select 512 samples as a batch, and every batch contains about 174k patches, which we set patch len to 48 , stride to 24. [...] We optimize with Adam [18], updating the model parameters every 4 steps, and the model trains for 1310k updates in total. A reduce learning rate on plateau scheduler is utilized to adjust learning rate during pre-training. Specifically, we set the basic learning rate as 1e 6 and the maximum learning rate as 2e 5, and the learning rate updates for every 10k updates. [Also refers to Table 5 for more details]. |