reproducibilityindex.ai

Rotation Invariant Householder Parameterization for Bayesian PCA

Authors: Rajbir Nirwan, Nils Bertschinger

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented both models in the probabilistic programming language Stan. The code for the simulations is available on Github: https://github.com/RSNirwan/Householder BPCA. Compared to previous approaches based on parameterizing the Stiefel manifold in terms of Givens rotations (Pourzanjani et al., 2017), our model has the following advantages: First, our Householder parameters v are unconstrained, in contrast to the angular parameters of Givens rotations where the sampler might hit the boundary of the space. Secondly, we avoid the computationally demanding computation of the Jacobian determinant (Shepard et al., 2014).
Researcher Affiliation	Academia	1Department of Computer Science, Goethe University, Frankfurt, Germany 2Frankfurt Institute for Advanced Studies, Frankfurt, Germany.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We implemented both models in the probabilistic programming language Stan. The code for the simulations is available on Github: https://github.com/RSNirwan/Householder BPCA.
Open Datasets	Yes	Here, we build our own synthetic dataset with known parameters and the goal is to reconstruct the parameter values. For (N, D, Q) = (150, 5, 2) we sample X RN D from a standard normal distribution and construct W by W = UΣ RD Q, where U is sampled from the Stiefel manifold with Haar measure and we specify Σ = diag(σ1, σ2), where (σ1, σ2) = (3.0, 1.0). Then, we get the observation Y = XW T + ϵ, where ϵ denotes the noise sampled from a zero mean Gaussian with a standard deviation of 0.01. We tested the model on the Breast Cancer Wisconsin dataset as well. The dataset was downloaded from the Python toolbox scikit-learn (Pedregosa et al., 2011) and contains 569 labeled datapoints with 30 features.
Dataset Splits	No	The paper describes how synthetic data was generated and mentions the Breast Cancer Wisconsin dataset size (569 datapoints, 30 features), but it does not specify any train/validation/test splits for either dataset.
Hardware Specification	No	The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Stan (Carpenter et al., 2017)' and 'scikit-learn (Pedregosa et al., 2011)' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For the GP-LVM, the input is the transposed matrix Y RN D, where N = 30 and D = 569. We standardized the data and set σSE and l to one and only sample the latent space.