reproducibilityindex.ai

Scaling Riemannian Diffusion Models

Authors: Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that our improved Riemannian Diffusion Models improve performance and scale to high dimensional real-world tasks. For example, we can faithfully learn Wilson action on 4 4 SU(3) lattices (128 dimensions). Furthermore, when applied to contrastively learned hyperspherical embeddings (127 dimensions), our method enables better model interpretability by recovering the collapsed projection head representations.
Researcher Affiliation	Academia	Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon Stanford University {aaronlou, minkai, adfarris, ermon}@stanford.edu
Pseudocode	Yes	Algorithm 1: Heat Kernel Computation
Open Source Code	Yes	Code found at https://github.com/louaaron/Scaling-Riemannian-Diffusion
Open Datasets	Yes	We test on the compiled Earth science datasets from [38]... Table 2: We compare contrastive learning OOD detection methods on CIFAR-100.
Dataset Splits	No	The paper mentions using datasets for training and testing (e.g., 'We test on the compiled Earth science datasets from [38]', 'We train for 100000 gradient updates...'), but it does not provide specific details on train/validation/test splits (e.g., exact percentages, sample counts, or explicit references to predefined standard splits) that would be needed for reproduction.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions various components like 'Si LU activation function' and 'Adam optimizer' but does not specify software versions for any libraries or frameworks used (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	We use a very similar architecture to the one used in Bortoli et al. [4] except we use the Si LU activation function without a learnable parameter [23] and a learning rate of 5 10 4. We use a standard MLP with 4 hidden layers and the Si LU activation function and learn with the Adam optimizer with learning rate set to 1e 3 [31]. ... We train for 100000 gradient updates with a batch size of 100. We train with a learning rate of 5 10 4 and perform 1000000 updates with a batch size of 512.