Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cosmos: Compressed and Smooth Latent Space for Text Diffusion Modeling

Authors: Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin, Aleksandr Abramov, Dmitry P Vetrov

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that text representations can be compressed up to 8 while maintaining generation quality comparable to token-level diffusion models. Furthermore, increasing the latent sequence length allows COSMOS to surpass both diffusion-based and autoregressive baselines. We evaluate COSMOS on four diverse generative tasks including story generation, question generation, summarization, and detoxification and compare it with various generative paradigms.
Researcher Affiliation	Collaboration	Viacheslav Meshchaninov* HSE University Constructor University EMAIL Egor Chimbulatov HSE University EMAIL Alexander Shabalin HSE University Constructor University EMAIL Aleksandr Abramov Salute Devices EMAIL Dmitry Vetrov Constructor University EMAIL
Pseudocode	No	The paper describes the methodology in prose and uses diagrams (e.g., Figure 1) to illustrate the pipeline, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is released at Git Hub.
Open Datasets	Yes	ROCStories This dataset [24] consists of five-sentence stories... Wikipedia For large-scale experiments, we use the English Wikipedia subset from the ROOTS corpus [15]... XSum This dataset [26] is used for abstractive summarization... SQu AD2.0 This question-answering dataset [31]... Para Detox For small-scale conditional generation experiments, we use the Para Detox dataset [17]... Open Web Text (OWT) [9] dataset.
Dataset Splits	Yes	ROCStories This dataset [24] consists of five-sentence stories... It contains a total of 98,161 instances, of which 88,161 are used for training, 10,000 for validation. ... XSum ... It includes 204,045 training instances, 11,332 validation instances, and 11,334 test instances. ... SQu AD2.0 ... The dataset contains 130,319 training and 11,873 test instances.
Hardware Specification	Yes	All models are trained on 8 NVIDIA A100 GPUs.
Software Dependencies	No	The paper mentions using bfloat16 data type and specific models like BERT and GPT-2, but does not provide specific software dependencies (e.g., Python, PyTorch, TensorFlow) along with their version numbers.
Experiment Setup	Yes	All models are trained on 8 NVIDIA A100 GPUs. Detailed training configurations and approximate durations for both COSMOSN = 128 and COSMOSN = 16 are provided in Table 7. Table 7 lists: Optimizer Adam W, Learning Rate 2e-4, (β1, β2) (0.9, 0.98), Warmup Steps 1000, Learning Rate Schedule Constant, Weight Decay 0.01, Gradient Clipping 1, EMA Decay 0.9999, Batch Size 1024, Training Steps (e.g., 200k for ROCStories), Max Seq Length (e.g., 80), Max Context Length (e.g., 512), Sampling steps 200, Schedule parameter (e.g., 5).