Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Selective Task Group Updates for Multi-Task Optimization

Authors: Wooseong Jeong, Kuk-Jin Yoon

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental settings. We assess the proposed techniques using three datasets: NYUD-V2 for indoor vision tasks (Silberman et al., 2012), PASCAL-Context for outdoor scenarios (Mottaghi et al., 2014), and Taskonomy (Zamir et al., 2018) for large number of tasks. Multi-task performance is compared using the metric introduced by (Maninis et al., 2019). This metric calculates the per-task performance by averaging it relative to the single-task baseline b: m = (1/K) PK i=1( 1)li(Mm,i Mb,i)/Mb,i where li = 1 if a lower value of measure Mi means indicates better performance for task τi, and 0 otherwise. More details are introduced in Appendix C.
Researcher Affiliation	Academia	Wooseong Jeong & Kuk-Jin Yoon Korea Advanced Institute of Science and Technology EMAIL
Pseudocode	Yes	Algorithm 1: Tracking Proximal Inter-Task Affinity for Task Group Updates
Open Source Code	No	We implement our experiments on top of publically available code from Ye & Xu (2022b). We run our experiments on A6000 GPUs.
Open Datasets	Yes	We assess the proposed techniques using three datasets: NYUD-V2 for indoor vision tasks (Silberman et al., 2012), PASCAL-Context for outdoor scenarios (Mottaghi et al., 2014), and Taskonomy (Zamir et al., 2018) for large number of tasks.
Dataset Splits	No	We assess the proposed techniques using three datasets: NYUD-V2 for indoor vision tasks (Silberman et al., 2012), PASCAL-Context for outdoor scenarios (Mottaghi et al., 2014), and Taskonomy (Zamir et al., 2018) for large number of tasks. Explanation: The paper mentions using well-known datasets but does not explicitly provide details about the training, validation, or test splits (e.g., percentages or sample counts) used for these datasets in the provided text.
Hardware Specification	Yes	We run our experiments on A6000 GPUs.
Software Dependencies	No	Table C.1: Hyperparameters for experiments. Hyperparameter Value Optimizer Adam Kingma & Ba (2014) Scheduler Polynomial Decay Minibatch size 8 Number of iterations 40000 Backbone (Transformer) Vi T Dosovitskiy et al. (2020) Learning rate 0.00002 Weight Decay 0.000001 Affinity decay factor β 0.001. Explanation: The paper lists software components like "Adam" and "Vi T" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Implementation Details. For experiments, we adopt Vi T Dosovitskiy et al. (2020) pre-trained on Image Net-22K Deng et al. (2009) as the multi-task encoder. Task-specific decoders merge the multi-scale features extracted by the encoder to generate the outputs for each task. The models are trained for 40,000 iterations on both NYUD Silberman et al. (2012) and PASCAL Everingham & Winn (2012) datasets with batch size 8. We used Adam optimizer with learning rate 2 10 5 and 1 10 6 of a weight decay with a polynomial learning rate schedule. The cross-entropy loss was used for semantic segmentation, human parts estimation, and saliency, edge detection. Surface normal prediction and depth estimation used L1 loss. The tasks are weighted equally to ensure a fair comparison. For the Taskonomy Benchmark Zamir et al. (2018), we use the dataloader from the open-access code provided by Chen et al. (2023), while maintaining experimental settings identical to those used for NYUD-v2 and PASCAL-Context. We use the same experimental setup for the other hyperparameters as in previous works Ye & Xu (2022a;c), as detailed in Table C.1. Table C.1: Hyperparameters for experiments. Hyperparameter Value Optimizer Adam Kingma & Ba (2014) Scheduler Polynomial Decay Minibatch size 8 Number of iterations 40000 Backbone (Transformer) Vi T Dosovitskiy et al. (2020) Learning rate 0.00002 Weight Decay 0.000001 Affinity decay factor β 0.001