Deep probabilistic subsampling for task-adaptive compressed sensing

Authors: Iris A.M. Huijben, Bastiaan S. Veeling, Ruud J.G. van Sloun

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate strong performance on reconstruction and classification tasks of a toy dataset, MNIST, and CIFAR10 under stringent subsampling rates in both the pixel and the spatial frequency domain. We test the applicability of the proposed task-adaptive DPS framework for three datasets and two distinct tasks: image classification and image reconstruction. The results presented in fig. 2a show that image-domain sampling using DPS significantly outperforms the fixed sampling baselines (uniform and disk).
Researcher Affiliation Academia Iris A.M. Huijben Department of Electrical Engineering Eindhoven University of Technology Eindhoven, The Netherlands i.a.m.huijben@tue.nl Bastiaan S. Veeling Department of Computer Science, University of Amsterdam Amsterdam, The Netherlands basveeling@gmail.com Ruud J.G. van Sloun Department of Electrical Engineering Eindhoven University of Technology Eindhoven, The Netherlands r.j.g.v.sloun@tue.nl
Pseudocode Yes Algorithm 1 Deep Probabilistic Subsampling (DPS) Require: Training dataset D, Number of iterations niter, temperature parameter τ, initialized trainable parameters Φ and θ. Ensure: Trained logits matrix Φ and task network parameters θ.
Open Source Code Yes The code used for this paper is made publicly available1. 1https://github.com/Iam Huijben/Deep-Probabilistic-Subsampling.git
Open Datasets Yes Classification performance was tested on the MNIST database (Le Cun et al., 1998), comprising 70,000 28 28 grayscale images of handwritten digits 0 to 9. The CIFAR10 database (Krizhevsky et al., 2009) contains 60,000 images of 32 32 pixels in 10 different classes.
Dataset Splits Yes We split the dataset into 50,000 training images, 5,000 validation, and 5,000 test images. We converted all images to grayscale, and subsequently split them into 50,000 training images, 5,000 validation and 5,000 test images.
Hardware Specification No The paper mentions training models and using optimizers, but does not specify any particular hardware components such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper mentions using the 'ADAM solver' and provides its hyperparameters but does not list specific versions for any software libraries or frameworks (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes Task model: After sampling M elements, all N zero-masked samples (or 2N in the case of complex Fourier samples) are passed through a series of 5 fully-connected layers, having N, 256, 128, 128 and 10 output nodes, respectively. The activations for all but the last layer were leaky Re LUs, and 20% dropout was applied after the first three layers. Training details: We train the network to maximize the log-likelihood of the observations D = {(xi, si) | i 0, . . . , L} through minimization of the categorical cross-entropy between the predictions and the labels, denoted by Ls. Penalty multiplier µ was set to linearly increase 1e 5 per epoch, starting from 0.0. The temperature parameter τ in eq. (7) was set to 2.0, and the sampling distribution parameters Φ were initialized randomly, following a zero-mean Gaussian distribution with standard deviation 0.25. Equation 9 was optimized using stochastic gradient descent on batches of 32 examples, approximating the expectation by a mean across the train dataset. To that end, we used the ADAM solver (β1 = 0.9, β2 = 0.999, and ϵ = 1e 7 ). We adopted different learning rates for the sampling parameters Φ and the parameters of the task model θ, being 2e 3 and 2e 4, respectively. For reconstruction tasks, learning rates for Φ and θ were 1e 3 and 1e 4, respectively, and µ and τ were respectively set to 2e 4 and 5.0. Mini-batches of 128 examples were used. For CIFAR10, λ was set to 0.004, learning rates for {Φ, ψ} and θ were 1e 3 and 2e 4, µ set to 1e 6, τ constant at 2.0. Batches of 8 images were used.