Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Privacy-Aware Time Series Synthesis via Public Knowledge Distillation
Authors: Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains. ... In this section, we assess the performance of Pub2Priv by analyzing the privacy-utility trade-off in comparison to state-of-the-art baselines across multiple domains. |
| Researcher Affiliation | Industry | Penghang Liu EMAIL JPMorgan AI Research; Haibei Zhu EMAIL JPMorgan AI Research; Eleonora Kreacic EMAIL JPMorgan AI Research; Svitlana Vyetrenko EMAIL JPMorgan AI Research |
| Pseudocode | Yes | Algorithm 1 Training Algorithm for Differentially Private Generator ฮธDM |
| Open Source Code | No | The paper does not explicitly state that the code for Pub2Priv is open-source or provide a link to a code repository. It mentions using 'Opacus (Yousefpour et al., 2021)' which is a third-party library, but this does not imply the authors' own implementation code is available. |
| Open Datasets | Yes | Electricity usage: The private data contains the daily electricity consumption of 370 users in รvora, Portugal (Bessa et al., 2015; Trindade, 2015) from 2011 to 2014. ... Artur Trindade. Electricityloaddiagrams20112014. UCI Machine Learning Repository, 10:C58C86, 2015. ... Semiconductor trading: We also collected international trading data from the UN Comtrade dataset 1. ... 1UN Comtrade dataset: https://comtradeplus.un.org/ |
| Dataset Splits | No | The paper describes how data is used for evaluation metrics (e.g., 'The original and synthetic data are evenly distributed in both training and testing datasets' for TSTR discriminative), but it does not provide specific split percentages or sample counts for training, validation, and testing of the primary datasets for their proposed model. |
| Hardware Specification | Yes | All experiments in the paper were conducted on AWS g4dn.4xlarge instances (16 v CPUs, 64 GB RAM, 16 GB GPU). |
| Software Dependencies | No | The paper mentions using 'Opacus (Yousefpour et al., 2021)' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We employ DP-SGD to protect the private data during training, which consists of two major procedures: gradient clipping and gradient noise addition. ... the gradients are clipped according to their โ2 norm and the clipping threshold C (we explore C {0.1, 0.5, 1.0, 1.5, 2.0} and select the one with lowest validation loss). ... Table 3: Hyperparamters of Pub2Priv. ... Table 4: Hyperparameters of baseline models. ... Learning rate 1e-4 ... We train all models with batch size 32 for 100 epochs. |