Estimating High Order Gradients of the Data Distribution by Denoising

Authors: Chenlin Meng, Yang Song, Wenzhe Li, Stefano Ermon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that models trained with the proposed method can approximate second order derivatives more efficiently and accurately than via automatic differentiation. Our experiments show that models learned with the proposed objective can approximate second order scores more accurately than applying automatic differentiation to lower order score models. Our approach is also more computationally efficient for high dimensional data, achieving up to 500ˆ speedups for second order score estimation on MNIST.
Researcher Affiliation Academia Chenlin Meng Stanford University chenlin@cs.stanford.edu Yang Song Stanford University yangsong@cs.stanford.edu Wenzhe Li Tsinghua University lwz21@mails.tsinghua.edu.cn Stefano Ermon Stanford University ermon@cs.stanford.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code, such as a specific repository link or an explicit code release statement.
Open Datasets Yes Our approach is also more computationally efficient for high dimensional data, achieving up to 500ˆ speedups for second order score estimation on MNIST. We visualize the diagonal of the estimated Covrx | xs for MNIST and CIFAR-10 [10] in Fig. 3.
Dataset Splits No The paper mentions 'test samples' and 'MNIST test set' but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing needed to reproduce the data partitioning.
Hardware Specification Yes We report the wall-clock time averaged in 7 runs used for estimating second order scores during test time on a TITAN Xp GPU in Table 2.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers.
Experiment Setup Yes We parameterize s1 and s2 with the same model architecture and use a batch size of 10 for both settings. We search the optimal step size for each method and observe that Ozaki sampling can use a larger step size and converge faster than Langevin dynamics (see Fig. 5). Ljointp q LD2SMp q γ LDSMp q, (14) where LDSMp q is defined in Eq. (3) and γ P R 0 is a tunable coefficient.