Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cycle Conditioning for Robust Representation Learning from Categorical Data

Authors: Mohsen Tabejamaat, Farzaneh Etminani, Mattias Ohlsson

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that our method achieves at least a 1.42% improvement in AUROC and a 4.12% improvement in AUCPR over the best results from existing state-of-the-art methods.
Researcher Affiliation	Academia	Mohsen Tabejamaat EMAIL School of Information Technology, Halmstad University, Sweden Farzaneh Etminani EMAIL Centrum for Research and Innovation, Region Halland, Sweden School of Information Technology, Halmstad University, Sweden Mattias Ohlsson EMAIL Centre for Environmental and Climate Science, Lund University, Sweden School of Information Technology, Halmstad University, Sweden
Pseudocode	Yes	Algorithm 1: Transforming categorical data to continuous for time series analysis with multiple tokens per time step
Open Source Code	No	The paper does not provide a specific repository link, an explicit code release statement, or indicate that code is available in supplementary materials. Phrases like 'we plan to release' or 'available upon request' are also not present.
Open Datasets	Yes	Our experiments are conducted using two well-known databases, MIMIC III (Johnson et al., 2016) and MIMIC IV (Johnson et al., 2023), each containing records of medical activities from various patient visits to a hospital.
Dataset Splits	Yes	Data split: Most of our experiments are conducted on MIMIC IV, where we randomly select 126,736 time series for training our representation learning method. The evaluation sets for different tasks are composed of another 31,684 randomly selected samples, which are used to evaluate the performance of the learned representations on various downstream tasks. For MIMIC III, all samples serve as training data for the representation learning model. The training and the evaluation of the downstream task are then conducted using the same split of the MIMIC IV.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU models, CPU models, or memory.
Software Dependencies	No	The paper mentions software like 'Python', 'PyTorch', 'CUDA' in the context of BERT or general frameworks, but does not provide specific version numbers for these or other key software components used for their methodology.
Experiment Setup	Yes	The representation dimension of our method, along with the models that utilize BERT as their backbone architecture, are set to 400. For Switch Tab(Wu et al., 2024), Re Con Tab (Chen et al., 2023), and SCARF (Bahri et al., 2021), we follow the same protocol as described in their original papers, resulting in representation dimensions of 52, 256, and 256, respectively. ... In practice, we leverage a Multi-Step Optimization approach, which allows the introduction of extra loss functions alongside the MSE loss of the diffusion models with minimal impact on their performance. Specifically, we alternate between optimizing the MSE and cross-entropy losses. The cross-entropy losses, Lspl and Lcycle-spl, are only applied if any of the MSE losses exceed a threshold of 0.05. During the multi-step optimization, we perform a single forward pass to compute both the DDPM s MSE losses and the cross-entropy losses. In the first step, we update the model parameters with respect to the MSE losses while freezing the components Mθ and dθ affected by the cross-entropy losses. In the next step, we update the model with respect to the cross-entropy losses, freezing the components relevant to the MSE losses.