Rotation Invariant Householder Parameterization for Bayesian PCA
Authors: Rajbir Nirwan, Nils Bertschinger
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented both models in the probabilistic programming language Stan. The code for the simulations is available on Github: https://github.com/RSNirwan/Householder BPCA. Compared to previous approaches based on parameterizing the Stiefel manifold in terms of Givens rotations (Pourzanjani et al., 2017), our model has the following advantages: First, our Householder parameters v are unconstrained, in contrast to the angular parameters of Givens rotations where the sampler might hit the boundary of the space. Secondly, we avoid the computationally demanding computation of the Jacobian determinant (Shepard et al., 2014). |
| Researcher Affiliation | Academia | 1Department of Computer Science, Goethe University, Frankfurt, Germany 2Frankfurt Institute for Advanced Studies, Frankfurt, Germany. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We implemented both models in the probabilistic programming language Stan. The code for the simulations is available on Github: https://github.com/RSNirwan/Householder BPCA. |
| Open Datasets | Yes | Here, we build our own synthetic dataset with known parameters and the goal is to reconstruct the parameter values. For (N, D, Q) = (150, 5, 2) we sample X RN D from a standard normal distribution and construct W by W = UΣ RD Q, where U is sampled from the Stiefel manifold with Haar measure and we specify Σ = diag(σ1, σ2), where (σ1, σ2) = (3.0, 1.0). Then, we get the observation Y = XW T + ϵ, where ϵ denotes the noise sampled from a zero mean Gaussian with a standard deviation of 0.01. We tested the model on the Breast Cancer Wisconsin dataset as well. The dataset was downloaded from the Python toolbox scikit-learn (Pedregosa et al., 2011) and contains 569 labeled datapoints with 30 features. |
| Dataset Splits | No | The paper describes how synthetic data was generated and mentions the Breast Cancer Wisconsin dataset size (569 datapoints, 30 features), but it does not specify any train/validation/test splits for either dataset. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Stan (Carpenter et al., 2017)' and 'scikit-learn (Pedregosa et al., 2011)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For the GP-LVM, the input is the transposed matrix Y RN D, where N = 30 and D = 569. We standardized the data and set σSE and l to one and only sample the latent space. |