Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance

Authors: Lisha Chen, Heshan Fernando, Yiming Ying, Tianyi Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to further demonstrate the three-way trade-off among the optimization, generalization, and conflict avoidance of the Mo Do algorithm. An average of 10 random seeds with 0.5 standard deviation is reported if not otherwise specified. We use the following synthetic example for the experiments in the strongly convex case. We simulate a synthetic multi-objective optimization problem using different loss functions applied for training an image classifier for MNIST handwritten digit dataset.
Researcher Affiliation Academia Lisha Chen Rensselaer Polytechnic Institute Troy, NY, United States chenl21@rpi.edu Heshan Fernando Rensselaer Polytechnic Institute Troy, NY, United States fernah@rpi.edu Yiming Ying University of Sydney Camperdown, Australia yiming.ying@sydney.edu.au Tianyi Chen Rensselaer Polytechnic Institute Troy, NY, United States chentianyi19@gmail.com
Pseudocode Yes Algorithm 1 Stochastic MGDA Mo Do algorithm 1: input Training data S, initial model x0, weighting coefficient λ0, and their learning rates {αt}T t=0, {γt}T t=0. 2: for t = 0, . . . , T 1 do 3: for objective m = 1, . . . , M do 4: Independent gradients fm,zt,s(xt), s [3] 5: end for 6: Compute dynamic weight λt+1 following (2.5a) 7: Update model parameter xt+1 following (2.5b) 8: end for 9: output x T
Open Source Code Yes Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
Open Datasets Yes We use MNIST image classification [21] using a multi-layer perceptron and three objectives: cross-entropy, mean squared error (MSE), and Huber loss. The training, validation, and testing data sizes are 50k, 10k, and 10k, respectively. We give the details for the experiments conducted using Office-31 and Office-home datasets, which consist of multi-domain image classification tasks. Both of these are multi-input single-task learning problems. Office-31 and Office-home consist of 31 and 65 image classes, respectively. We give the details for the experiments conducted using NYU-v2 dataset, which consists of image segmentation, depth estimation, and surface normal estimation tasks. The dataset consists of images from indoor video sequences.
Dataset Splits Yes The training, validation, and testing data sizes are 50k, 10k, and 10k, respectively. Hyperparameters such as step sizes are chosen based on each algorithm s validation accuracy performance, as given in Table 6.
Hardware Specification Yes Experiments are done on a machine with GPU NVIDIA RTX A5000.
Software Dependencies Yes We use MATLAB R2021a for the synthetic experiments in strongly convex case, and Python 3.8, CUDA 11.7, Pytorch 1.8.0 for other experiments.
Experiment Setup Yes The default parameters are T = 100, α = 0.01, γ = 0.001. We set M = 3, b1 = [b1,1; b1,2; b1,3] = [1; 2; 1], and b2 = [b2,1; b2,2; b2,3] = [1; 3; 2]. The training dataset size is n = |S| = 20. For all methods, i.e., MGDA, static weighting, and Mo Do, the number of iterations is T = 50000. The initialization of λ is λ0 = [0.5, 0.5] . The hyperparameters for this experiment are summarized in Table 5. The model architecture is a two-layer multi-layer perceptron (MLP). Each hidden layer has 512 neurons, and no hidden layer activation. The input size is 784, and the output size is 10, the number of digit classes. We use batch size 64 to update static weighting and MGDA, and use 2 independent samples of batch size 32 to update Mo Do, for both Office-31 and Office-home.