Scaling Riemannian Diffusion Models

Authors: Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that our improved Riemannian Diffusion Models improve performance and scale to high dimensional real-world tasks. For example, we can faithfully learn Wilson action on 4 4 SU(3) lattices (128 dimensions). Furthermore, when applied to contrastively learned hyperspherical embeddings (127 dimensions), our method enables better model interpretability by recovering the collapsed projection head representations.
Researcher Affiliation Academia Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon Stanford University {aaronlou, minkai, adfarris, ermon}@stanford.edu
Pseudocode Yes Algorithm 1: Heat Kernel Computation
Open Source Code Yes Code found at https://github.com/louaaron/Scaling-Riemannian-Diffusion
Open Datasets Yes We test on the compiled Earth science datasets from [38]... Table 2: We compare contrastive learning OOD detection methods on CIFAR-100.
Dataset Splits No The paper mentions using datasets for training and testing (e.g., 'We test on the compiled Earth science datasets from [38]', 'We train for 100000 gradient updates...'), but it does not provide specific details on train/validation/test splits (e.g., exact percentages, sample counts, or explicit references to predefined standard splits) that would be needed for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions various components like 'Si LU activation function' and 'Adam optimizer' but does not specify software versions for any libraries or frameworks used (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes We use a very similar architecture to the one used in Bortoli et al. [4] except we use the Si LU activation function without a learnable parameter [23] and a learning rate of 5 10 4. We use a standard MLP with 4 hidden layers and the Si LU activation function and learn with the Adam optimizer with learning rate set to 1e 3 [31]. ... We train for 100000 gradient updates with a batch size of 100. We train with a learning rate of 5 10 4 and perform 1000000 updates with a batch size of 512.