Learning Group Importance using the Differentiable Hypergeometric Distribution
Authors: Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E Vogt
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform three experiments that empirically validate the proposed method and highlight the versatility and applicability of the differentiable hypergeometric distribution to different important areas of machine learning. We first test the generated samples of the proposed differentiable formulation procedure against a non-differentiable reference implementation. Second, we present how the hypergeometric distribution helps detecting shared generative factors of paired samples in a weakly-supervised setting. Our third experiment demonstrates the hypergeometric distribution as a prior in variational clustering algorithms. |
| Researcher Affiliation | Academia | Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt Department of Computer Science ETH Zurich Switzerland {thomas.sutter,laura.manduchi,alain.ryser,julia.vogt}@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Sampling from the differentiable hypergeometric distribution. The different blocks are explained in more detail in Sections 4.1 to 4.3 and Algorithm 2. |
| Open Source Code | Yes | The code can be found here: https://github.com/thomassutter/mvhg |
| Open Datasets | Yes | In this experiment, we look at pairs of images from the synthetic mpi3D toy dataset (Gondal et al., 2019). We compare them on three different MNIST versions (Le Cun & Cortes, 2010). |
| Dataset Splits | Yes | From these 1000 samples, we use 800 for training and 200 for validation. |
| Hardware Specification | Yes | All our experiments were performed on our internal compute cluster, equipped with NVIDIA RTX 2080 and NVIDIA RTX 1080. Every training and test run used only a single NVIDIA RTX 2080 or NVIDIA RTX 1080. |
| Software Dependencies | No | The paper mentions software like Tensorflow, PyTorch, SciPy, and scikit-learn with citations, but does not provide specific version numbers for these software dependencies as used in their experiments. For example, 'Tensorflow (Abadi et al., 2016)' only cites the paper's publication year, not the software version. |
| Experiment Setup | Yes | All experiments were performed using β = 1.0 as this is the best performing β according to Locatello et al. (2020). We set the initial temperature τinit to 10 and the final temperature τfinal to 0.01, which is annealed over nsteps = 50000. We used an initial learning rate of 10^-6 together with the Adam optimizer (Kingma & Ba, 2014) for our final experiments. In particular, the learning rate is set to 0.001, the batch size is set to 128 and the models are trained for 1000 epochs. |