reproducibilityindex.ai

Optimizing Ratio of Monotone Set Functions

Authors: Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted experiments on F-measure maximization to investigate the actual performance of PORM. We compare PORM only with Greed Ratio, since it is the previous algorithm achieving the best empirical performance [Bai et al., 2016]. The generalized form of F-measure is used for evaluation, i.e., Fp(X) = (\|Γ(X) O\|)/(p\|O\|+(1 p)\|Γ(X)\|). We will test p = {0.2, . . . , 0.8}. Note that the F-measure in Deﬁnition 4 is just F0.5. For PORM, the number T of iterations is set to 3en2(2+log \|Γ(Xinit)\|) (where Xinit is the initial subset generated by PORM), as suggested by Theorem 2. Note that we have used the lower bound 1 for 1 1 κf . We use synthetic (syn-100, syn-1000) as well as real-world (ldc-100, ldc-1000) data sets. For syn-100, a bipartite graph G(V, W, E) with \|V \| = \|W\| = 100 is randomly generated by adding one edge between v V and w W independently with probability 0.05; the target set O with \|O\| = 20 is randomly chosen from W. For syn-1000, \|V \| = \|W\| = 1000, \|O\| = 100 and each edge is added with probability 0.01. ldc-100 and ldc-1000 contain 100 and 1000 English sentences, respectively, which are randomly sampled from the LDC2002E18 text data (https://www.ldc.upenn.edu/). Figure 1 shows the percentages of the solution quality that PORM improves from Greed Ratio, where we can observe that PORM is always better than Greed Ratio and can have more than 3% improvement. Comparing the running time (in the number of function calculations), Greed Ratio takes the time in the order of n2; PORM is set to use 3en2(2 + log \|Γ(Xinit)\|) time according to the theoretical upper bound (i.e., a worst case) for PORM being good. We empirically examine how effective PORM is in practice. By selecting Greed Ratio as the baseline, we plot the curve of the F-measure over the running time for PORM on syn-100 and ldc-100 with p = 0.8, as shown in Figure 2. The x-axis is in n2, the running time of Greed Ratio. We can observe that PORM takes about only 6% (3/52) and 3% (2/69) of the worst-case running time to achieve a better performance, respectively. This implies that PORM can be efﬁcient in practice.
Researcher Affiliation	Academia	Chao Qian1, Jing-Cheng Shi2, Yang Yu2, Ke Tang1, Zhi-Hua Zhou2 1UBRI, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {chaoqian, ketang}@ustc.edu.cn, {shijc, yuy, zhouzh}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1 Greed Ratio Algorithm Input: monotone submodular functions f, g : 2V R+ Output: a subset X V Process: ... Algorithm 2 PORM algorithm Input: a monotone submodular function f : {0, 1}n R+ and a monotone function g : {0, 1}n R+ Parameter: the number T of iterations Output: a solution x {0, 1}n Process: ...
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that code for the described methodology is publicly available.
Open Datasets	Yes	ldc-100 and ldc-1000 contain 100 and 1000 English sentences, respectively, which are randomly sampled from the LDC2002E18 text data (https://www.ldc.upenn.edu/).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It describes how synthetic datasets are generated and real-world datasets are sampled, but not how they are partitioned for experimental evaluation.
Hardware Specification	No	The paper does not specify the hardware used for its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For PORM, the number T of iterations is set to 3en2(2+log \|Γ(Xinit)\|) (where Xinit is the initial subset generated by PORM), as suggested by Theorem 2. ... The generalized form of F-measure is used for evaluation, i.e., Fp(X) = (\|Γ(X) O\|)/(p\|O\|+(1 p)\|Γ(X)\|). We will test p = {0.2, . . . , 0.8}. For syn-100, a bipartite graph G(V, W, E) with \|V \| = \|W\| = 100 is randomly generated by adding one edge between v V and w W independently with probability 0.05; the target set O with \|O\| = 20 is randomly chosen from W. For syn-1000, \|V \| = \|W\| = 1000, \|O\| = 100 and each edge is added with probability 0.01.