Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
QTIP: Quantization with Trellises and Incoherence Processing
Authors: Albert Tseng, Qingyao Sun, David Hou, Christopher M. De Sa
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments Here, we present experiments quantizing the Llama family of models with QTIP [33, 34, 26]. and Table 3: Wikitext2 and C4 perplexity ( ), ctx. 4096, QTIP with pure-computed codes. |
| Researcher Affiliation | Academia | Albert Tseng Cornell University EMAIL Qingyao Sun Cornell University EMAIL David Hou EMAIL Christopher De Sa Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Computed Gaussian Code 1MAD, Algorithm 2 Computed Gaussian Code 3INST, Algorithm 3 Hybrid Computed-Lookup 2D Gaussian Code HYB, Algorithm 4 Tail-biting Trellis Approx. and Algorithm 5 QTIP with Block LDLQ. |
| Open Source Code | Yes | Our code is available at https://github.com/Cornell-Relax ML/qtip. |
| Open Datasets | Yes | All sequences were sampled from the Red Pajama dataset [7]. and We use the OPTQ Wikitext2 and C4 test splits to calculate perplexity [14]. |
| Dataset Splits | No | The paper mentions using 'Wikitext2 and C4 test splits' and data for Hessian generation, but does not explicitly provide the train/validation/test dataset splits (e.g., percentages, sample counts, or specific predefined split citations for the entire training and validation process) needed for reproduction. |
| Hardware Specification | Yes | Table 4: Batch size 1 decoding throughput on a RTX6000 Ada (960GB/s mem. BW). and Table 17: Decoding speed on different Ampere and Lovelace GPUs. listing RTX 3090, RTX A6000 Ampere, RTX 6000 Ada. |
| Software Dependencies | No | The paper mentions running on 'NVIDIA GPUs' and discussing 'ALU instructions', but it does not specify software dependencies like exact Python, PyTorch, or CUDA versions needed for reproducibility. |
| Experiment Setup | Yes | Here, we use 1MAD and 3INST with L = 16, V = 1, Tx = Ty = 16. and Here, we use the hybrid lookup-computed code with L = 16, V = 2, Tx = Ty = 16, Q = 9. and to evaluate this, we fine-tune using Qu IP# s methodology, tuning both the codebook entries and the as-yet-unquantized weights in a blockwise fashion. |