Semi-crowdsourced Clustering with Deep Generative Models
Authors: Yucen Luo, TIAN TIAN, Jiaxin Shi, Jun Zhu, Bo Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods. |
| Researcher Affiliation | Academia | Yucen Luo, Tian Tian, Jiaxin Shi, Jun Zhu , Bo Zhang Dept. of Comp. Sci. & Tech., Institute for AI, THBI Lab, BNRist Center, State Key Lab for Intell. Tech. & Sys., Tsinghua University, Beijing, China {luoyc15,shijx15}@mails.tsinghua.edu.cn, rossowhite@163.com {dcszj,dcszb}@mail.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Semi-crowdsoursed clustering with DGMs (Bayes SCDC) Input: observations O = {o1, ..., o N}, annotations L(1:M), variational parameters (ηΘ, γ, φ) repeat ψi r(oi; φ), t(xi) , i = 1, ..., N for each local variational parameter η xi and η zi do Update alternatively using eq. (12) and eq. (14) end for Sample ˆxi q (xi), i = 1, ..., N Use ˆxi to approximate Eq (x) log p(o|x; γ) in the lower bound J eq. (16) Update the global variational parameters ηΘ using the natural gradient in eq. (17) Update φ, γ using φ,γJ (ηΘ; φ, γ) until Convergence |
| Open Source Code | Yes | Code is available at https://github. com/xinmei9322/semicrowd. Part of the implementation is based on Zhu Suan [22]. |
| Open Datasets | Yes | We test on Face dataset [7]... MNIST ... CIFAR-10. |
| Dataset Splits | No | The paper mentions training, but does not explicitly provide details about validation splits or percentages for any of the datasets used beyond the general experimental setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions that "Part of the implementation is based on Zhu Suan [22]" but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | For Bayes SCDC, a non-informative prior Beta(1, 1) is placed over α, β. For fair comparison, we also randomly sample the initial accuracy parameters α, β from Beta(1, 1) for SCDC. We average the results of 5 runs. In each run we randomly initialize the model for 10 times and pick the best result. All models are trained for 200 epochs with minibatch size of 128 for each random initialization. |