Random Function Priors for Correlation Modeling
Authors: Aonan Zhang, John Paisley
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirical results on three text datasets: a 5K subset of New York Times, 20Newsgroups, and Neur IPS. Their basic statistics are shown in Table 2. In Table 1, we compare three Bayesian nonparametric models: hierarchical Dirichlet process (HDP) (Teh et al., 2005), discrete infinite logistic normal (DILN) (Paisley et al., 2012b), and our population random measure embedding (PRME) using 4-layer MLP with batch normalization. As Table 1 shows, PRME consistently perform better than HDP and DILN. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering & Data Science Institute, Columbia University, New York, USA. |
| Pseudocode | Yes | Algorithm 1 Feature paintboxes model. Algorithm 2 Stochastic inference algorithm |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the work described, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | We show empirical results on three text datasets: a 5K subset of New York Times, 20Newsgroups, and Neur IPS. Their basic statistics are shown in Table 2. |
| Dataset Splits | No | The paper mentions 'For each test document Xn, we do a 90%/10% split into training words Xn,T R and testing words Xn,T S.' This refers to a split within each test document for perplexity calculation, not a global train/validation/test split for the datasets used in experiments. No explicit validation split information is provided for the main datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2014)' as an optimizer and various neural network architectures (MLP, Res Net) but does not provide specific version numbers for software dependencies such as deep learning frameworks or programming languages. |
| Experiment Setup | Yes | All gradient updates are done via Adam (Kingma & Ba, 2014) with learning rate 10^-4. We tune γ0 and fix the truncation level K = 100 and set the a = 1, b = 1, α = 1, β = 5 for fair comparisons. Let ρ(t) (t0 + t) κ be the step size with some constant t0 and κ (0.5, 1]. For the larger one million New York Times dataset, we show topic paintboxes learned with stochastic PRME in Figure 4. We set t0 = 100, κ = 0.75 and use a 6-layer MLP. In Figure 5(c), we compare run times for updating local parameters ([Z, C] for PRME) and global parameters ([θ, ℓ, V, g, f] for PRME) with batch size 500. |