reproducibilityindex.ai

Diffusing Gaussian Mixtures for Generating Categorical Data

Authors: Florence Regol, Mark Coates

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method of evaluation highlights the capabilities and limitations of different generative models for generating categorical data, and includes experiments on synthetic and real-world protein datasets.
Researcher Affiliation	Academia	Florence Regol, Mark Coates Dept. Electrical and Computer Engineering, Mc Gill University Montr eal, QC, Canada florence.robert-regol@mail.mcgill.ca, mark.coates@mcgill.ca
Pseudocode	No	Algorithms detailing the training and sampling procedures are provided in the supplementary.
Open Source Code	Yes	The source code is available at https://github. com/networkslab/gmcd.
Open Datasets	Yes	As a real world application, we measure the performance of the models on two protein datasets from the Pfam protein family : PF00076, which contains N = 137, 605 proteins of length S = 70 and PF00014, which contains N = 13, 600 proteins of length S = 53. The number of categories for both datasets corresponds to the list of amino acids K = 21.
Dataset Splits	Yes	A split of 70/20/10 is used for the protein datasets.
Hardware Specification	Yes	Experiments are conducted on GPU machines NVIDIA Ge Force RTX 2060 .
Software Dependencies	No	The paper mentions using the RAdam optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	No	The paper states, 'We provide a complete description of architectures, the hyperparameters selection procedure in the supplementary,' indicating that specific experimental setup details like hyperparameter values are not in the main text.