reproducibilityindex.ai

PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Authors: Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Jay Shi, Shuyang Cheng, Dragomir Anguelov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results show that the optimal choice within the Poly Loss is indeed dependent on the task and dataset. Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classiﬁcation, instance segmentation, object detection, and 3D object detection tasks, sometimes by a large margin.
Researcher Affiliation	Industry	Zhaoqi Leng1, Mingxing Tan1, Chenxi Liu1, Ekin Dogus Cubuk2, Xiaojie Shi2, Shuyang Cheng1,Dragomir Anguelov1 1Waymo LLC 2Google LLC
Pseudocode	Yes	Example code for LCE Poly-1 with softmax activation is shown below. def poly1_cross_entropy(logits, labels, epsilon): # epsilon >=-1. # pt, CE, and Poly1 have shape [batch]. pt = tf.reduce_sum(labels * tf.nn.softmax(logits), axis=-1) CE = tf.nn.softmax_cross_entropy_with_logits(labels, logits) Poly1 = CE + epsilon * (1 - pt) return Poly1
Open Source Code	Yes	Our experiments are based on public datasets and open source code repositories, shown in footnote 3-6. [...] Code at https://github.com/tensorflow/tpu/tree/master/models/official/ [...] Code at https://github.com/google/automl/tree/master/efficientnetv2 [...] Code at https://github.com/tensorflow/lingvo/tree/master/lingvo/tasks/car
Open Datasets	Yes	On Image Net (Deng et al., 2009), our Poly Loss improves both pretraining and ﬁnetuning for the recent Efﬁcient Net V2 (Tan & Le, 2021); on COCO (Lin et al., 2014), Poly Loss improves both 2D detection and segmentation AR for Mask-RCNN (He et al., 2017); on Waymo Open Dataset (WOD) (Sun et al., 2020), Poly Loss improves 3D detection AP for the widely used Point Pillars (Lang et al., 2019) and the very recent Range Sparse Net (RSN) (Sun et al., 2021).
Dataset Splits	Yes	We reserve 25,000 images from the training set as minival to search the optimal ϵ1.
Hardware Specification	No	The paper does not explicitly mention specific hardware specifications such as GPU or CPU models used for the experiments.
Software Dependencies	No	The paper mentions using TensorFlow, but does not specify exact version numbers for TensorFlow or other software libraries/dependencies.
Experiment Setup	Yes	We use Res Net-50 (He et al., 2016) and its training hyperparameters without modiﬁcation. [...] For the following experiments, we adopt the default training hyperparameters in the public repositories without any tuning. [...] We set ϵ1 = 2 for both. [...] In training Mask R-CNN, we use the training schedule optimized for cross-entropy loss, and replace the cross-entropy loss with LPoly-1 = log(Pt) + ϵ1(1 Pt) for the classiﬁcation loss Lcls, where ϵ1 ∈ {−1.0, −0.8, −0.6, −0.4, −0.2, 0, 0.5, 1.0}.