Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Nonparametric graphical model for counts

Authors: Arkaprava Roy, David B Dunson

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove various theoretical properties, including posterior consistency, and show that our COunt Nonparametric Graphical Analysis (CONGA) approach has good performance relative to competitors in simulation studies. The methods are motivated by an application to neuron spike count data in mice.
Researcher Affiliation Academia Arkaprava Roy EMAIL Department of Biostatistics University of Florida Gainesville, FL 32611, USA David B Dunson EMAIL Department of Statistics Duke University Durham, NC 27708-0251, USA
Pseudocode Yes We update λ.j using the MCMC sampling scheme described in Chapter 5 of Ghosal and Van der Vaart (2017) for the Dirichlet process mixture prior of λij based on the above conditional likelihood. For clarity this algorithm is described below: (i) Calculate the probability vector Qj for each j such that Qj(k) = Pois(Xij, λkj) and Qj(i) = Mj Ga(λi,j, a + Xi,j, b + 1). (ii) Sample an index l from 1 : T with probability Qj/ P k Qj(k). (iii) If l = i, λij = λlj. Otherwise sample a new value as described below. (iv) Mj is sampled from Gamma(c + U, d log(δ)), where U is the number of unique elements in λ.j, δ is sampled from Beta(Mj, T), and Mj Ga(c, d) a priori. When we have to generate a new value for λtj in step (iii), we consider the following scheme. (i) Generate a candidate λc tj from Gamma(a + Xtj, b + 1). (ii) Adjust the update λc tj = λ0 tj +K1(λc tj λ0 tj), where λ0 tj is the current value and K1 < 1 is a tuning parameter, adjusted with respect to the acceptance rate of the resulting Metropolis-Hastings (MH) step. (iii) We use the pseudo-likelihood based on the conditional likelihoods in (4) to calculate the MH acceptance probability. To generate β, we consider a new likelihood that the standardized (tan 1(Xtl))θ follows a multivariate Gaussian distribution with precision matrix Ωsuch that Ωpq = Ωqp = βpq with p < q and Ωpp = (V ar((tan 1(Xtl))θ) 1)pp. Thus diagonal entries do not change over iterations. We update Ωl, l = {Ωl,i : i = l} successively. We also define Ω l, l as the submatrix by removing l-th row and column. Let s = (F(x) F(X))T (F(x) F(X)). Thus s is the P P gram matrix of (tan 1 X)θ, standardized over columns. (i) Generate an update for Ωl, l using the posterior distribution as in Wang (2012). Thus a candidate Ωc l, l is generated from MVN( Csl, l, C), where C = ((s22 + γ)Ω 1 l, l + D 1 l ) 1, where Dl is the prior variance corresponding to Ωl, l (ii) Adjust the update Ωc l, l = Ω0 l, l + K2 (Ωc l, l Ω0 l, l) (Ωc l, l Ω0 l, l) 2 , where Ω0 l, l is the current value and K2 is a tuning parameter, adjusted with respect to the acceptance rate of the following MH step. Also K2 should always be less than (Ωc l, l Ω0 l, l) 2. (iii) Use the pseudo-likelihood based on the conditional likelihoods in (4), multiplying over t to calculate the MH acceptance probability. π(θ0|θc) = π(θG) and π(θc|θ0) = π(θ G), where θG is the original Gibbs update.
Open Source Code Yes All the required functions to fit the CONGA algorithm along with a supplementary R code with an example usage are provided at https://github.com/royarkaprava/CONGA.
Open Datasets No The methods are motivated by an application to neuron spike count data in mice. ... The dataset records neuron spike counts in mice across 37 neurons in the brain under the influence of three different external stimuli, 2-D sinusoids with vertical gradient, horizontal gradient, and the sum. These neurons are from the same depth of the visual cortex of a mouse. The data are collected for around 400-time points.
Dataset Splits No The paper does not provide specific details on how the neuron spike count data or simulated data were split into training, validation, or test sets.
Hardware Specification No The paper does not provide specific details on the hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies No We compare our method CONGA with TPGM, SPGM, LPGM, huge, BDgraph, and ssgraph. The first three are available in R package XMRF and the last two are in R packages BDgraph and ssgraph respectively. The function huge is from R package huge which fits a nonparanormal graphical model.
Experiment Setup Yes We choose ν3 = 100, which is the prior variance of the normal prior of βjl for all j, l. The choice ν3 = 100 makes the prior weakly informative. The parameter γ is chosen to be 5 as given in Wang (2012). For the gamma distribution, we consider a = b = 1. For the Dirichlet process mixture, we take c = d = 10.