Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Point4Bit: Post Training 4-bit Quantization for Point Cloud 3D Detection

Authors: Jianyu Wang, Yu Wang, Shengjie Zhao, Sifan Zhou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To further assess its generalizability across different tasks, we conduct additional experiments on 3D object classification and semantic segmentation. For classification, we adopt the Model Net40 [42] and Scan Object NN [39] datasets, which are widely used as benchmarks in this domain, and evaluate performance using Overall Accuracy (OA) and mean Class Accuracy (m Acc). For semantic segmentation, evaluations are performed on the real-world Li DAR dataset Semantic KITTI [3], using mean Intersection-over-Union (m Io U) as the evaluation metric.
Researcher Affiliation	Academia	Jianyu Wang1 , Yu Wang1 , Shengjie Zhao1 , Sifan Zhou2 1Tongji University 2Carnegie Mellon University Corresponding author: EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Point4bit Quantization Input: Pretrained FP model with N layers; Calibration dataset Dc. Output: quantization parameters of both activation and weight in network, i.e., weight scale sw, weight zero-point zw, activation scale {sfg 1 , ..., sfg m , sbg}.
Open Source Code	No	This paper fully discloses all the information needed to reproduce the main experimental results in supplementary material and we will release our code on Git Hub soon.
Open Datasets	Yes	We evaluate the effectiveness of the Point4bit framework primarily on the large-scale autonomous driving dataset nu Scenes [4] for the 3D object detection task. To further assess its generalizability across different tasks, we conduct additional experiments on 3D object classification and semantic segmentation. For classification, we adopt the Model Net40 [42] and Scan Object NN [39] datasets, which are widely used as benchmarks in this domain, and evaluate performance using Overall Accuracy (OA) and mean Class Accuracy (m Acc). For semantic segmentation, evaluations are performed on the real-world Li DAR dataset Semantic KITTI [3], using mean Intersection-over-Union (m Io U) as the evaluation metric.
Dataset Splits	Yes	Nu Scenes dataset [4] uses a 32-beam Li DAR to collect data from 1000 urban driving scenes, annotated with 3D bounding boxes for 10 object classes. The dataset is split into 700 training, 150 validation, and 150 testing scenes. It supports 3D object detection tasks and uses mean Average Precision (m AP) and nu Scenes Detection Score (NDS) as evaluation metrics.
Hardware Specification	Yes	We execute all experiments on a single Nvidia Tesla V100 GPU. As shown in Tab. 10, on the NVIDIA Jetson AGX Orin platform, which is a common onboard devices for autonomous driving in the community, the quantized model achieves an inference speed of 31.1 FPS, which is approximately 3 faster than its FP counterpart at 12.5 FPS. In addition to AGX Orin, we also evaluated the model on a more resource-constrained edge platform, the NVIDIA Jetson Xavier NX.
Software Dependencies	No	The paper does not provide specific software dependency versions (e.g., Python, PyTorch, CUDA versions used for their implementation), only mentions external tools like Tensor RT 8.6+ and spconv.
Experiment Setup	Yes	For the nu Scenes dataset, we randomly sample 256 point cloud frames from the train set as calibration data, accounting for only 0.91% of the total training frames (256/28,130). Calibration is performed with a batch size of 4. The quantization hyperparameters are set as follows: m = 2 defines the number of CDF-based quantization intervals; m1 = 0.2 specifies the proportion of high-activation voxels selected as foreground for fine-grained activation quantization; and m2 = 0.8 indicates the proportion of important weights selected for reconstruction based on gradient sensitivity. Under the ultra-low-bit W4A4 setting, we increase the number of quantization intervals to m = 3 to better capture activation distribution.