Diffusing Gaussian Mixtures for Generating Categorical Data

Authors: Florence Regol, Mark Coates

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets.
Researcher Affiliation Academia Florence Regol, Mark Coates Dept. Electrical and Computer Engineering, Mc Gill University Montr eal, QC, Canada florence.robert-regol@mail.mcgill.ca, mark.coates@mcgill.ca
Pseudocode No Algorithms detailing the training and sampling procedures are provided in the supplementary.
Open Source Code Yes The source code is available at https://github. com/networkslab/gmcd.
Open Datasets Yes As a real world application, we measure the performance of the models on two protein datasets from the Pfam protein family : PF00076, which contains N = 137, 605 proteins of length S = 70 and PF00014, which contains N = 13, 600 proteins of length S = 53. The number of categories for both datasets corresponds to the list of amino acids K = 21.
Dataset Splits Yes A split of 70/20/10 is used for the protein datasets.
Hardware Specification Yes Experiments are conducted on GPU machines NVIDIA Ge Force RTX 2060 .
Software Dependencies No The paper mentions using the RAdam optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup No The paper states, 'We provide a complete description of architectures, the hyperparameters selection procedure in the supplementary,' indicating that specific experimental setup details like hyperparameter values are not in the main text.