Estimating High Order Gradients of the Data Distribution by Denoising
Authors: Chenlin Meng, Yang Song, Wenzhe Li, Stefano Ermon
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that models trained with the proposed method can approximate second order derivatives more efficiently and accurately than via automatic differentiation. Our experiments show that models learned with the proposed objective can approximate second order scores more accurately than applying automatic differentiation to lower order score models. Our approach is also more computationally efficient for high dimensional data, achieving up to 500ˆ speedups for second order score estimation on MNIST. |
| Researcher Affiliation | Academia | Chenlin Meng Stanford University chenlin@cs.stanford.edu Yang Song Stanford University yangsong@cs.stanford.edu Wenzhe Li Tsinghua University lwz21@mails.tsinghua.edu.cn Stefano Ermon Stanford University ermon@cs.stanford.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code, such as a specific repository link or an explicit code release statement. |
| Open Datasets | Yes | Our approach is also more computationally efficient for high dimensional data, achieving up to 500ˆ speedups for second order score estimation on MNIST. We visualize the diagonal of the estimated Covrx | xs for MNIST and CIFAR-10 [10] in Fig. 3. |
| Dataset Splits | No | The paper mentions 'test samples' and 'MNIST test set' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing needed to reproduce the data partitioning. |
| Hardware Specification | Yes | We report the wall-clock time averaged in 7 runs used for estimating second order scores during test time on a TITAN Xp GPU in Table 2. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | We parameterize s1 and s2 with the same model architecture and use a batch size of 10 for both settings. We search the optimal step size for each method and observe that Ozaki sampling can use a larger step size and converge faster than Langevin dynamics (see Fig. 5). Ljointp q LD2SMp q γ LDSMp q, (14) where LDSMp q is defined in Eq. (3) and γ P R 0 is a tunable coefficient. |