reproducibilityindex.ai

Learning Group Importance using the Differentiable Hypergeometric Distribution

Authors: Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E Vogt

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform three experiments that empirically validate the proposed method and highlight the versatility and applicability of the differentiable hypergeometric distribution to different important areas of machine learning. We first test the generated samples of the proposed differentiable formulation procedure against a non-differentiable reference implementation. Second, we present how the hypergeometric distribution helps detecting shared generative factors of paired samples in a weakly-supervised setting. Our third experiment demonstrates the hypergeometric distribution as a prior in variational clustering algorithms.
Researcher Affiliation	Academia	Thomas M. Sutter, Laura Manduchi, Alain Ryser, Julia E. Vogt Department of Computer Science ETH Zurich Switzerland {thomas.sutter,laura.manduchi,alain.ryser,julia.vogt}@inf.ethz.ch
Pseudocode	Yes	Algorithm 1 Sampling from the differentiable hypergeometric distribution. The different blocks are explained in more detail in Sections 4.1 to 4.3 and Algorithm 2.
Open Source Code	Yes	The code can be found here: https://github.com/thomassutter/mvhg
Open Datasets	Yes	In this experiment, we look at pairs of images from the synthetic mpi3D toy dataset (Gondal et al., 2019). We compare them on three different MNIST versions (Le Cun & Cortes, 2010).
Dataset Splits	Yes	From these 1000 samples, we use 800 for training and 200 for validation.
Hardware Specification	Yes	All our experiments were performed on our internal compute cluster, equipped with NVIDIA RTX 2080 and NVIDIA RTX 1080. Every training and test run used only a single NVIDIA RTX 2080 or NVIDIA RTX 1080.
Software Dependencies	No	The paper mentions software like Tensorflow, PyTorch, SciPy, and scikit-learn with citations, but does not provide specific version numbers for these software dependencies as used in their experiments. For example, 'Tensorflow (Abadi et al., 2016)' only cites the paper's publication year, not the software version.
Experiment Setup	Yes	All experiments were performed using β = 1.0 as this is the best performing β according to Locatello et al. (2020). We set the initial temperature τinit to 10 and the final temperature τfinal to 0.01, which is annealed over nsteps = 50000. We used an initial learning rate of 10^-6 together with the Adam optimizer (Kingma & Ba, 2014) for our final experiments. In particular, the learning rate is set to 0.001, the batch size is set to 128 and the models are trained for 1000 epochs.