Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
Authors: Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to further demonstrate the three-way trade-off among the optimization, generalization, and conflict avoidance of the Mo Do algorithm. An average of 10 random seeds with 0.5 standard deviation is reported if not otherwise specified. We use the following synthetic example for the experiments in the strongly convex case. We simulate a synthetic multi-objective optimization problem using different loss functions applied for training an image classifier for MNIST handwritten digit dataset. |
| Researcher Affiliation | Academia | Lisha Chen Rensselaer Polytechnic Institute Troy, NY, United States chenl21@rpi.edu Heshan Fernando Rensselaer Polytechnic Institute Troy, NY, United States fernah@rpi.edu Yiming Ying University of Sydney Camperdown, Australia yiming.ying@sydney.edu.au Tianyi Chen Rensselaer Polytechnic Institute Troy, NY, United States chentianyi19@gmail.com |
| Pseudocode | Yes | Algorithm 1 Stochastic MGDA Mo Do algorithm 1: input Training data S, initial model x0, weighting coefficient λ0, and their learning rates {αt}T t=0, {γt}T t=0. 2: for t = 0, . . . , T 1 do 3: for objective m = 1, . . . , M do 4: Independent gradients fm,zt,s(xt), s [3] 5: end for 6: Compute dynamic weight λt+1 following (2.5a) 7: Update model parameter xt+1 following (2.5b) 8: end for 9: output x T |
| Open Source Code | Yes | Code is available at https://github.com/heshandevaka/Trade-Off-MOL. |
| Open Datasets | Yes | We use MNIST image classification [21] using a multi-layer perceptron and three objectives: cross-entropy, mean squared error (MSE), and Huber loss. The training, validation, and testing data sizes are 50k, 10k, and 10k, respectively. We give the details for the experiments conducted using Office-31 and Office-home datasets, which consist of multi-domain image classification tasks. Both of these are multi-input single-task learning problems. Office-31 and Office-home consist of 31 and 65 image classes, respectively. We give the details for the experiments conducted using NYU-v2 dataset, which consists of image segmentation, depth estimation, and surface normal estimation tasks. The dataset consists of images from indoor video sequences. |
| Dataset Splits | Yes | The training, validation, and testing data sizes are 50k, 10k, and 10k, respectively. Hyperparameters such as step sizes are chosen based on each algorithm s validation accuracy performance, as given in Table 6. |
| Hardware Specification | Yes | Experiments are done on a machine with GPU NVIDIA RTX A5000. |
| Software Dependencies | Yes | We use MATLAB R2021a for the synthetic experiments in strongly convex case, and Python 3.8, CUDA 11.7, Pytorch 1.8.0 for other experiments. |
| Experiment Setup | Yes | The default parameters are T = 100, α = 0.01, γ = 0.001. We set M = 3, b1 = [b1,1; b1,2; b1,3] = [1; 2; 1], and b2 = [b2,1; b2,2; b2,3] = [1; 3; 2]. The training dataset size is n = |S| = 20. For all methods, i.e., MGDA, static weighting, and Mo Do, the number of iterations is T = 50000. The initialization of λ is λ0 = [0.5, 0.5] . The hyperparameters for this experiment are summarized in Table 5. The model architecture is a two-layer multi-layer perceptron (MLP). Each hidden layer has 512 neurons, and no hidden layer activation. The input size is 784, and the output size is 10, the number of digit classes. We use batch size 64 to update static weighting and MGDA, and use 2 independent samples of batch size 32 to update Mo Do, for both Office-31 and Office-home. |