Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unified Scaling Laws for Compressed Representations

Authors: Andrei Panferov, Alexandra Volkova, Ionut-Vlad Modoranu, Vage Egiazarian, Mher Safaryan, Dan Alistarh

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper investigates the interplay between scaling laws and compression formats, exploring whether a uniﬁed scaling framework can accurately predict model performance when training occurs over various compressed representations, such as sparse, scalar-quantized, sparse-quantized or even vectorquantized formats. Our key contributions include validating a general scaling law formulation and showing that it is applicable both individually but also composably across compression types. Based on this, our main ﬁnding is demonstrating both theoretically and empirically that there exists a simple capacity metric based on the representation s ability to ﬁt random Gaussian data which can robustly predict parameter efﬁciency across multiple compressed representations. On the practical side, we extend our formulation to directly compare the accuracy potential of different compressed formats, and to derive better algorithms for training over sparse-quantized formats.
Researcher Affiliation	Collaboration	Andrei Panferov ISTA Alexandra Volkova ISTA Ionut-Vlad Modoranu ISTA Vage Egiazarian ISTA Mher Safaryan ISTA Dan Alistarh ISTA & Red Hat AI Correspondence to: EMAIL.
Pseudocode	Yes	Algorithm 1 VQ Training Forward Algorithm 2 VQ Training Backward Algorithm 3 Adam with Straight Through Estimation (STE) and AMSGrad normalization
Open Source Code	Yes	Our source code is available at: IST-DASLab/unifiedsc-laws
Open Datasets	Yes	For our scaling law investigations, we pretrained decoder-only Transformers following the Llama architecture [34] for 30M, 50M, 100M and 200M non-embedding parameters. The models were trained on the C4 dataset [28], using the Llama-2 tokenizer [34].
Dataset Splits	No	The models were trained on the C4 dataset [28], using the Llama-2 tokenizer [34]. To ensure we operate in a data-rich regime, we use 50, 100, and 200 training tokens per model parameter for each training conﬁguration, and train on ﬁxed-length context windows of 512 tokens.
Hardware Specification	Yes	We use 8x80GB H100 machines for efﬁcient training, and training one model takes on average 1 hour.
Software Dependencies	No	The paper mentions software components like "Adam W", "Llama architecture", "Llama-2 tokenizer", "PyTorch", "scipy.stats.norm.ppf", but does not provide specific version numbers for any of these.
Experiment Setup	Yes	For our scaling law investigations, we pretrained decoder-only Transformers following the Llama architecture [34] for 30M, 50M, 100M and 200M non-embedding parameters. The models were trained on the C4 dataset [28], using the Llama-2 tokenizer [34]. To ensure we operate in a data-rich regime, we use 50, 100, and 200 training tokens per model parameter for each training conﬁguration, and train on ﬁxed-length context windows of 512 tokens. We used Adam W [18; 23] with a 0.1 ratio of warm-up epochs with cosine scheduler. Our experimental setup is very similar to that of [9; 19; 10]. More details are provided in Appendix A. Table 2: Key architectural and training hyperparameters for Llama family models. (in Appendix A, which includes Model size, # Layers, # Heads, # Embeddings, Learning rate)