Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Approximate Information Tests on Statistical Submanifolds
Authors: Michael W. Trosset, Carey E. Priebe
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Examples illustrate the efficacy of the proposed methodology. Keywords: Restricted Inference, Dimension Reduction, Information Geometry, Minimum Distance Test. Section 7 reports a small simulation study designed to explore the effect of sampling density on performance. |
| Researcher Affiliation | Academia | Michael W. Trosset EMAIL Department of Statistics Indiana University Bloomington, IN 47408, USA Carey E. Priebe EMAIL Department of Applied Mathematics & Statistics Johns Hopkins University Baltimore, MD 21218-2682, USA |
| Pseudocode | Yes | Figure 1: An approximate information test for the case of an unknown submodel that can be sampled. Steps 2 4 are essentially isomap (Tenenbaum et al., 2000), used here to represent the Riemannian structure of a statistical manifold rather than a data manifold. Details are provided in Section 6. |
| Open Source Code | No | No explicit statement about the release of source code for the methodology described in this paper is found. |
| Open Datasets | No | The paper describes experiments based on statistical models (multinomial and trinomial distributions) and simulated data (e.g., 'o = (3, 5, 4, 6, 9, 2, 1)' in the Motivating Example, and generating 'τ1, . . . , τ100 Uniform[0, π/2]2' in Example 4), but does not use or provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper describes generating simulated random samples from hypothesized distributions for significance probability estimation and power analysis (e.g., 'Estimate a significance probability by generating simulated random samples from the hypothesized distribution p.' in Figure 1, and 'Repeating this procedure on 10000 simulated samples of size n = 30 drawn from the null distribution...' in Example 4). It does not involve predefined training/test/validation dataset splits typically found in machine learning contexts. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running the experiments. |
| Software Dependencies | No | The paper discusses various algorithms and methods (e.g., isomap, classical multidimensional scaling, majorization, Newton's method, Floyd-Warshall algorithm) but does not provide specific software names with version numbers (e.g., Python 3.8, PyTorch 1.9) used for implementation. |
| Experiment Setup | Yes | Fix σ = ψ((π/4, arctan 2)), n = 30, and α = 0.05. For m = 25, 100, 400 and a = 1, . . . , 5, generate τ1, . . . , τm Uniform[0, π/2]2. Compute σi = ψ(τi). Set B = 1000. 1. Construct a representation of the submanifold in ℜ2. (a) Compute the pairwise Hellinger distances between σ, σ1, . . . , σm. Construct G by connecting vertices i and j if either vertex i is one of vertex j s K = 10 nearest neighbors or vice versa. (b) Compute the pairwise shortest path distances in G. Embed the shortest path distances in ℜ2 by minimizing the raw stress criterion, obtaining z, z1, . . . , zm. 2. Estimate the critical value. For b = 1, . . . , B, draw o from a multinomial distribution with n trials and probability vector σ. (a) Compute the Hellinger distances between o/n and σ, σ1, . . . , σm and determine the ℓ= 3 nearest neighbors of o/n. |