Multimodal Poisson Gamma Belief Network
Authors: Chaojie Wang, Bo Chen, Mingyuan Zhou
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on bi-modal data consisting of images and tags show that the m PGBN can easily impute a missing modality and hence is useful for both image annotation and retrieval. We further demonstrate that the m PGBN achieves state-of-the-art results on unsupervisedly extracting latent features from multimodal data. |
| Researcher Affiliation | Academia | Chaojie Wang, Bo Chen National Laboratory of Radar Signal Processing Collaborative Innovation Center of Information Sensing & Understanding Xidian University, Xi an, Shaanxi, China Mingyuan Zhou Mc Combs School of Business University of Texas at Austin Austin, TX 78712, USA |
| Pseudocode | No | The paper provides mathematical formulations of the model but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'Publicly available code (Vedaldi and Fulkerson 2010; Bastan et al. 2010) could be used to extract these features' which refers to third-party code. It does not provide any link or explicit statement about releasing the source code for the proposed m PGBN model. |
| Open Datasets | Yes | We use in our experiments the MIR-Flicker data set (Huiskes and Lew 2008), which consists of 1 million images along with their user assigned tags that are retrieved from the social photography website Flicker. |
| Dataset Splits | No | The paper specifies training and testing splits: '15k image-text pairs are used for training and the remaining 10k pairs for testing'. However, it does not explicitly state a distinct 'validation' dataset split for hyperparameter tuning or early stopping, though some internal evaluation (e.g., 'infer a set of networks') might implicitly serve that role without being explicitly named as a validation set. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used to run the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions 'Publicly available code (Vedaldi and Fulkerson 2010; Bastan et al. 2010)' for feature extraction, but does not provide specific version numbers for these or any other software dependencies crucial for replication (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | For hyper-parameters, we set ηt = 0.05 for all t, a0 = b0 = 0.01, and e0 = f0 = 1. We use 15k image-text pairs randomly selected from MIR-Flicker 25k to infer a set of networks with T {1, 2, 3, 4, 5} and Kt {50, 100, 200, 400, 800}, and apply the upward-downward Gibbs sampler to collect 200 MCMC samples after 200 burn-in to estimate the posterior mean of the latent representation of each test data sample. We choose a two-hidden-layer m PGBN, with 1024 hidden units in both hidden layers. We use 1000 Gibbs sampling iterations to train the m PGBN on the 15k training image-text pairs, and retain the inferred network (global variables) of the last sample. For each test image-text pair, we collect 500 MCMC samples after 500 burn-in iterations to infer its latent representation (local variables) under the network retained after training. |