Learning the Pareto Front with Hypernetworks
Authors: Aviv Navon, Aviv Shamsian, Ethan Fetaya, Gal Chechik
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a wide set of problems, from multi-task regression and classification to fairness. PHNs learn the entire Pareto front at roughly the same time as learning a single point on the front and at the same time reach a better solution set. |
| Researcher Affiliation | Collaboration | Aviv Navon Bar-Ilan University, Israel aviv.navon@biu.ac.il Aviv Shamsian Bar-Ilan University, Israel aviv.shamsian@biu.ac.il Ethan Fetaya Bar-Ilan University, Israel ethan.fetaya@biu.ac.il Gal Chechik Bar-Ilan University, Israel NVIDIA, Israel gal.chechik@biu.ac.il |
| Pseudocode | Yes | Algorithm 1 PHN while not converged do r Dir(α) θ(φ, r) = h(r; φ) Sample mini-batch (x1, y1), .., (x B, y B) if LS then i,j ri φℓi(xj, yj, θ(φ, r)) if EPO then β = EPO(θ(φ, r), ℓ, r) gφ 1 B P i,j βi φℓi(xj, yj, θ(φ, r)) φ φ ηgφ return φ |
| Open Source Code | Yes | We make our source code publicly available at: https://github.com/AvivNavon/pareto-hypernetworks. |
| Open Datasets | Yes | Multi-MNIST (Sabour et al., 2017); (2) Multi-Fashion, and (3) Multi-Fashion + MNIST. In each dataset, two instances are sampled uniformly at random from the MNIST (Le Cun et al., 1998) or Fashion-MNIST (Xiao et al., 2017) datasets. ... Adult (Dua & Graff, 2017), Default (Yeh & Lien, 2009) and Bank (Moro et al., 2014). ... NYUv2 dataset (Silberman et al., 2012). ... SARCOS dataset (Vijayakumar), a commonly used dataset for multitask regression (Zhang & Yang, 2017). |
| Dataset Splits | Yes | We allocate 10% of each training set for constructing validation sets. ... Each dataset is divided into train/validation/test sets of sizes 70%/10%/20% respectively. ... We use 40,036 training examples, 4,448 validation examples, and 4,449 test examples. |
| Hardware Specification | Yes | Run-time (min., Tesla V100) |
| Software Dependencies | No | The paper mentions using 'Adam (Kingma & Ba, 2015) optimizer' but does not specify version numbers for any software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | For our hypernetwork, we use an MLP with 2 hidden layers and linear heads. We set the hidden dimension to 100 for the Multi-MNIST and NYU experiment, and 25 for the Fairness and SARCOS datasets. The Dirichlet parameter α in Alg. 1 is set to 0.2 for all experiments... We train all methods using an Adam optimizer with learning rate 1e-4 for 150 epochs and batch size 256. |