Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PAC-Bayes Bounds for Multivariate Linear Regression and Linear Autoencoders

Authors: Ruixin Guo, Ruoming Jin, Xinyu Li, Yang Zhou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our bound is tight and correlates well with practical ranking metrics such as Recall@K and NDCG@K.
Researcher Affiliation Academia Ruixin Guo Kent State University EMAIL Ruoming Jin Kent State University EMAIL Xinyu Li Kent State University EMAIL Yang Zhou Auburn University EMAIL
Pseudocode Yes Algorithm 1 Computing the PAC-Bayes bound for LAEs Input: Σhh, p, δ, σ, Λ = {λ1, λ2, ..., λL}, X, Y , and an LAE model W (with diag(W) = 0). Compute Σxx, Σxy, Σyy with Σhh, p by Lemma 4.4. Set π = N(W, σ2I) (i.e., let U0 = W such that W is the mean prior of π). Let G = {} be a set to store the results. for each λi in Λ: Compute ρ = N(U, S) with π, λi by Theorem 5.2 (b). Compute D( ρ || π ) with ρ, π by (42) in Appendix F. Compute EW ρ[Remp(W)] with ρ, X, Y by (40) in Appendix F. Compute EW ρ[Rtrue(W)] with ρ, Σxx, Σxy, Σyy by (41) in Appendix F, and let it be the left hand side of (14), denoted as LHi. Compute Eπ h eλRtrue(W )i with π, Σxx, Σxy, Σyy, λi by (12). Compute the right hand side of (14), denoted as RHi, with EW ρ[Remp(W)], D( ρ || π ), Eπ h eλRtrue(W )i . Append (LHi, RHi) to G. Output: the pair (LH , RH ) in G, where RH = min1 i L{RHi}.
Open Source Code Yes We have provided the code and the links to the open datasets (Movie Lens 20M, Netflix, MSD) in the supplemental material.
Open Datasets Yes We use three datasets: Movie Lens 20M (ML 20M), Netflix and MSD, with their details shown in Table 2 in Appendix F. ... We have provided the code and the links to the open datasets (Movie Lens 20M, Netflix, MSD) in the supplemental material.
Dataset Splits Yes We split it into a training set Htrain {0, 1}n (m m) and a test set Htest {0, 1}n m by setting m = 0.3m . The test set Htest is further split into an input matrix X and a target matrix Y , with a hold-out fraction 1 p = 1/2.
Hardware Specification Yes Our experiments run on a machine with 500 GB RAM and an Nvidia A100 GPU. The GPU has 80 GB RAM.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers.
Experiment Setup Yes We set γ in (4) to values of 50, 100, 200, 500, 1000, 2000 and 5000 to generate seven different LAE models and evaluate them accordingly. Other inputs of the algorithm are set as follows: δ = 0.01, σ = 0.001, Λ = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512}.