Scaling Riemannian Diffusion Models
Authors: Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that our improved Riemannian Diffusion Models improve performance and scale to high dimensional real-world tasks. For example, we can faithfully learn Wilson action on 4 4 SU(3) lattices (128 dimensions). Furthermore, when applied to contrastively learned hyperspherical embeddings (127 dimensions), our method enables better model interpretability by recovering the collapsed projection head representations. |
| Researcher Affiliation | Academia | Aaron Lou, Minkai Xu, Adam Farris, Stefano Ermon Stanford University {aaronlou, minkai, adfarris, ermon}@stanford.edu |
| Pseudocode | Yes | Algorithm 1: Heat Kernel Computation |
| Open Source Code | Yes | Code found at https://github.com/louaaron/Scaling-Riemannian-Diffusion |
| Open Datasets | Yes | We test on the compiled Earth science datasets from [38]... Table 2: We compare contrastive learning OOD detection methods on CIFAR-100. |
| Dataset Splits | No | The paper mentions using datasets for training and testing (e.g., 'We test on the compiled Earth science datasets from [38]', 'We train for 100000 gradient updates...'), but it does not provide specific details on train/validation/test splits (e.g., exact percentages, sample counts, or explicit references to predefined standard splits) that would be needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions various components like 'Si LU activation function' and 'Adam optimizer' but does not specify software versions for any libraries or frameworks used (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | We use a very similar architecture to the one used in Bortoli et al. [4] except we use the Si LU activation function without a learnable parameter [23] and a learning rate of 5 10 4. We use a standard MLP with 4 hidden layers and the Si LU activation function and learn with the Adam optimizer with learning rate set to 1e 3 [31]. ... We train for 100000 gradient updates with a batch size of 100. We train with a learning rate of 5 10 4 and perform 1000000 updates with a batch size of 512. |