Multimodal Poisson Gamma Belief Network

Authors: Chaojie Wang, Bo Chen, Mingyuan Zhou

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on bi-modal data consisting of images and tags show that the m PGBN can easily impute a missing modality and hence is useful for both image annotation and retrieval. We further demonstrate that the m PGBN achieves state-of-the-art results on unsupervisedly extracting latent features from multimodal data.
Researcher Affiliation Academia Chaojie Wang, Bo Chen National Laboratory of Radar Signal Processing Collaborative Innovation Center of Information Sensing & Understanding Xidian University, Xi an, Shaanxi, China Mingyuan Zhou Mc Combs School of Business University of Texas at Austin Austin, TX 78712, USA
Pseudocode No The paper provides mathematical formulations of the model but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper mentions 'Publicly available code (Vedaldi and Fulkerson 2010; Bastan et al. 2010) could be used to extract these features' which refers to third-party code. It does not provide any link or explicit statement about releasing the source code for the proposed m PGBN model.
Open Datasets Yes We use in our experiments the MIR-Flicker data set (Huiskes and Lew 2008), which consists of 1 million images along with their user assigned tags that are retrieved from the social photography website Flicker.
Dataset Splits No The paper specifies training and testing splits: '15k image-text pairs are used for training and the remaining 10k pairs for testing'. However, it does not explicitly state a distinct 'validation' dataset split for hyperparameter tuning or early stopping, though some internal evaluation (e.g., 'infer a set of networks') might implicitly serve that role without being explicitly named as a validation set.
Hardware Specification No The paper does not provide specific details regarding the hardware used to run the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions 'Publicly available code (Vedaldi and Fulkerson 2010; Bastan et al. 2010)' for feature extraction, but does not provide specific version numbers for these or any other software dependencies crucial for replication (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes For hyper-parameters, we set ηt = 0.05 for all t, a0 = b0 = 0.01, and e0 = f0 = 1. We use 15k image-text pairs randomly selected from MIR-Flicker 25k to infer a set of networks with T {1, 2, 3, 4, 5} and Kt {50, 100, 200, 400, 800}, and apply the upward-downward Gibbs sampler to collect 200 MCMC samples after 200 burn-in to estimate the posterior mean of the latent representation of each test data sample. We choose a two-hidden-layer m PGBN, with 1024 hidden units in both hidden layers. We use 1000 Gibbs sampling iterations to train the m PGBN on the 15k training image-text pairs, and retain the inferred network (global variables) of the last sample. For each test image-text pair, we collect 500 MCMC samples after 500 burn-in iterations to infer its latent representation (local variables) under the network retained after training.