Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Without Augmenting: Unsupervised Time Series Representation Learning via Frame Projections

Authors: Berken Utku Demirel, Christian Holz

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our method on nine datasets across five temporal sequence tasks, where signal-specific characteristics make data augmentations particularly challenging. Without relying on augmentation-induced diversity, our method achieves performance gains of up to 15 20% over existing self-supervised approaches.
Researcher Affiliation	Academia	Berken Utku Demirel Department of Computer Science ETH Zürich, Switzerland EMAIL Christian Holz Department of Computer Science ETH Zürich, Switzerland EMAIL
Pseudocode	Yes	We provide pseudocode implementation of our method in Appendix B. Algorithm 1 Pre-training algorithm for the proposed method Algorithm 2 The proposed method for inference
Open Source Code	Yes	Source code: https://github.com/eth-siplab/Learning-with-Frame Projections
Open Datasets	Yes	Heart rate We used the IEEE Signal Processing Cup in 2015 (IEEE SPC) [35], and Da Lia [36] for PPG-based heart rate prediction from wrist. Activity recognition We used HHAR [37], and USC [38] for activity recognition from inertial measurement units from smartphones or wearable devices. Cardiovascular disease (CVD) classification We conducted experiments on China Physiological Signal Challenge (CPSC2018) [39] and Chapman University, Shaoxing People s Hospital (Chapman) datasets [40]. Step counting We used the Clemson dataset [42], which released for pedometer evaluation. Sleep stage classification We used the Sleep-EDF dataset, from Physio Bank [43], which includes 197 whole-night PSG sleep recordings
Dataset Splits	Yes	We used the leave-one-session-out (LOSO) crossvalidation, which evaluates models on subjects/sessions that were not used for training. We evaluated the cross-person generalization performance of the models, that is, the model was evaluated on previously unseen subjects. We split the dataset for fine-tuning and testing based on patients (each patient s recordings appear in only one set). We conducted 10-fold cross-validation, with each fold consisting of 3 subjects. Pre-training is performed using 9 folds, with the remaining fold held out for testing. We split the dataset to 80 20% for training and testing as suggested in [40].
Hardware Specification	Yes	We performed our experiments on NVIDIA Ge Force RTX 4090 GPUs, involving training with three random seeds for all datasets, totaling approximately 680 GPU hours including ablation. All experiments fit within 24 GB of GPU memory, without requiring excessive computational resources.
Software Dependencies	Yes	We normalize the FFT using 1 n by setting norm= ortho in Py Torch s [92] torch.fft.rfft, ensuring the transformation is orthonormal. We use Py Wavelets [93] implementation.
Experiment Setup	Yes	We train models with a batch size of 1024 for 256 epochs and decay the learning rate using the cosine decay schedule. After pre-training, we train a single linear layer classifier on features extracted from the frozen pre-trained network. The models were optimized using Adam [52] with a learning rate of 0.003, while the linear layer was fine-tuned with a learning rate of 0.03.