Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test
Authors: Seunghoon Paik, Michael Celentano, Alden Green, Ryan J. Tibshirani
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that the RKS test has asymptotically full power at distinguishing any distinct pair P = Q of distributions, derive its asymptotic null distribution, and carry out experiments to elucidate the strengths and weaknesses of the RKS test versus the more traditional kernel MMD test. We complement our theory with numerical experiments to explore the operating characteristics of the RKS test compared to other popular nonparametric two-sample tests. 4. Experiments |
| Researcher Affiliation | Academia | Seunghoon Paik1 EMAIL Michael Celentano1 EMAIL Alden Green2 EMAIL Ryan J. Tibshirani1 EMAIL 1Department of Statistics, University of California, Berkeley, CA 94720, USA 2Department of Statistics, Stanford University, Stanford, CA 94305, USA |
| Pseudocode | Yes | For concreteness, we summarize our computational approach below in Algorithm 1. Algorithm 1 RKS test statistic |
| Open Source Code | Yes | Python code to replicate all of our experimental results is available at https://github.com/100shpaik/. |
| Open Datasets | No | For each dimension d, we consider ļ¬ve settings for P, Q, which are described in Table 1. In each setting, the parameter v controls the discrepancy between P and Q, but its precise meaning depends on the setting. The settings were broadly chosen in order to study the operating characteristics of the RKS test when diļ¬erences between P and Q occur in one direction (settings 1 4), and in all directions (setting 5). Among the settings in which the diļ¬erences occur in one direction, we also investigate diļ¬erent varieties (settings 1 and 2: mean shift under diļ¬erent geometries, setting 3: tail diļ¬erence, setting 4: variance diļ¬erence). Figure 2 visualizes samples from drawn each task in d = 2 dimensions, whereas Figure 3 exaggerates the deviation between P, Q (larger values of v) to better illustrate the geometry. Finally, we note that since the RKS test is rotationally invariant,3 the fact that the chosen diļ¬erences in Table 1 are axis-aligned is just a matter of convenience, and the results would not change if these diļ¬erences instead occurred along arbitrary directions in Rd. Table 1: Experimental settings. Here Nd(µ, Ī£) means the d-dimensional normal distribution with mean µ and covariance Ī£, and t(v) means the t distribution with v degrees of freedom. |
| Dataset Splits | No | We ļ¬x the sample sizes to m = n = 512 throughout, and study four choices of dimension: d = 2, 4, 8, 16. For each setting, we compute these test statistics under the null where each xi and yi are sampled i.i.d. from the mixture m/(m+n) P + n/(m+n) Q, and under the alternative where xi are i.i.d. from P and yi from Q. We then repeat this 100 times (draws of samples, and computation of test statistics), and trace out ROC curves true positive versus false positive rates as we vary the rejection threshold for each test. |
| Hardware Specification | No | For k ā„ 1, we apply the torch.optim.Adam optimizer (a variation on gradient descent), as implemented in PyTorch, to (10). For k = 0, such a ļ¬rst-order scheme is not applicable due to the fact that the gradient of the 0th degree ridge spline (wTx ā b)0 + = 1{wTx ā„ b} (with respect to w and b) is almost everywhere zero. As a surrogate, we directly approximate the optimum (w*, b*) in (2) using logistic regression where the class labels identify samples from P versus Q, as implemented in sklearn.linear_model.LogisticRegression in Python. |
| Software Dependencies | No | For k ā„ 1, we apply the torch.optim.Adam optimizer (a variation on gradient descent), as implemented in PyTorch, to (10). For k = 0, such a ļ¬rst-order scheme is not applicable due to the fact that the gradient of the 0th degree ridge spline (wTx ā b)0 + = 1{wTx ā„ b} (with respect to w and b) is almost everywhere zero. As a surrogate, we directly approximate the optimum (w*, b*) in (2) using logistic regression where the class labels identify samples from P versus Q, as implemented in sklearn.linear_model.LogisticRegression in Python. |
| Experiment Setup | Yes | For k ā„ 1, we apply the torch.optim.Adam optimizer (a variation on gradient descent), as implemented in PyTorch, to (10). We use a betas parameter (0.9, 0.99), learning rate 0.5, number of iterations T = 200, penalty parameter Ī» = 1, and number of neurons N = 10. To enforce the nonnegativity condition on b, we project b to [0, ā) after each gradient step. Rather than take the last iterate, we choose the maximal IPM values among the iterates (after rescaling by the RTVk seminorm of each iterate so that it lies in the unit seminorm ball). Further, we repeat this over three random initializations, and select the best resultant IPM value to be the ļ¬nal output. We fix the sample sizes to m = n = 512 throughout, and study four choices of dimension: d = 2, 4, 8, 16. For the RKS tests, we examine smoothness degrees k = 0, 1, 2, 3, and we center the input data to have the sample mean zero jointly across both samples. |