Diffusing Gaussian Mixtures for Generating Categorical Data
Authors: Florence Regol, Mark Coates
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets. |
| Researcher Affiliation | Academia | Florence Regol, Mark Coates Dept. Electrical and Computer Engineering, Mc Gill University Montr eal, QC, Canada florence.robert-regol@mail.mcgill.ca, mark.coates@mcgill.ca |
| Pseudocode | No | Algorithms detailing the training and sampling procedures are provided in the supplementary. |
| Open Source Code | Yes | The source code is available at https://github. com/networkslab/gmcd. |
| Open Datasets | Yes | As a real world application, we measure the performance of the models on two protein datasets from the Pfam protein family : PF00076, which contains N = 137, 605 proteins of length S = 70 and PF00014, which contains N = 13, 600 proteins of length S = 53. The number of categories for both datasets corresponds to the list of amino acids K = 21. |
| Dataset Splits | Yes | A split of 70/20/10 is used for the protein datasets. |
| Hardware Specification | Yes | Experiments are conducted on GPU machines NVIDIA Ge Force RTX 2060 . |
| Software Dependencies | No | The paper mentions using the RAdam optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper states, 'We provide a complete description of architectures, the hyperparameters selection procedure in the supplementary,' indicating that specific experimental setup details like hyperparameter values are not in the main text. |