Optimizing Ratio of Monotone Set Functions
Authors: Chao Qian, Jing-Cheng Shi, Yang Yu, Ke Tang, Zhi-Hua Zhou
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments on F-measure maximization to investigate the actual performance of PORM. We compare PORM only with Greed Ratio, since it is the previous algorithm achieving the best empirical performance [Bai et al., 2016]. The generalized form of F-measure is used for evaluation, i.e., Fp(X) = (|Γ(X) O|)/(p|O|+(1 p)|Γ(X)|). We will test p = {0.2, . . . , 0.8}. Note that the F-measure in Definition 4 is just F0.5. For PORM, the number T of iterations is set to 3en2(2+log |Γ(Xinit)|) (where Xinit is the initial subset generated by PORM), as suggested by Theorem 2. Note that we have used the lower bound 1 for 1 1 κf . We use synthetic (syn-100, syn-1000) as well as real-world (ldc-100, ldc-1000) data sets. For syn-100, a bipartite graph G(V, W, E) with |V | = |W| = 100 is randomly generated by adding one edge between v V and w W independently with probability 0.05; the target set O with |O| = 20 is randomly chosen from W. For syn-1000, |V | = |W| = 1000, |O| = 100 and each edge is added with probability 0.01. ldc-100 and ldc-1000 contain 100 and 1000 English sentences, respectively, which are randomly sampled from the LDC2002E18 text data (https://www.ldc.upenn.edu/). Figure 1 shows the percentages of the solution quality that PORM improves from Greed Ratio, where we can observe that PORM is always better than Greed Ratio and can have more than 3% improvement. Comparing the running time (in the number of function calculations), Greed Ratio takes the time in the order of n2; PORM is set to use 3en2(2 + log |Γ(Xinit)|) time according to the theoretical upper bound (i.e., a worst case) for PORM being good. We empirically examine how effective PORM is in practice. By selecting Greed Ratio as the baseline, we plot the curve of the F-measure over the running time for PORM on syn-100 and ldc-100 with p = 0.8, as shown in Figure 2. The x-axis is in n2, the running time of Greed Ratio. We can observe that PORM takes about only 6% (3/52) and 3% (2/69) of the worst-case running time to achieve a better performance, respectively. This implies that PORM can be efficient in practice. |
| Researcher Affiliation | Academia | Chao Qian1, Jing-Cheng Shi2, Yang Yu2, Ke Tang1, Zhi-Hua Zhou2 1UBRI, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {chaoqian, ketang}@ustc.edu.cn, {shijc, yuy, zhouzh}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Greed Ratio Algorithm Input: monotone submodular functions f, g : 2V R+ Output: a subset X V Process: ... Algorithm 2 PORM algorithm Input: a monotone submodular function f : {0, 1}n R+ and a monotone function g : {0, 1}n R+ Parameter: the number T of iterations Output: a solution x {0, 1}n Process: ... |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code for the described methodology is publicly available. |
| Open Datasets | Yes | ldc-100 and ldc-1000 contain 100 and 1000 English sentences, respectively, which are randomly sampled from the LDC2002E18 text data (https://www.ldc.upenn.edu/). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It describes how synthetic datasets are generated and real-world datasets are sampled, but not how they are partitioned for experimental evaluation. |
| Hardware Specification | No | The paper does not specify the hardware used for its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For PORM, the number T of iterations is set to 3en2(2+log |Γ(Xinit)|) (where Xinit is the initial subset generated by PORM), as suggested by Theorem 2. ... The generalized form of F-measure is used for evaluation, i.e., Fp(X) = (|Γ(X) O|)/(p|O|+(1 p)|Γ(X)|). We will test p = {0.2, . . . , 0.8}. For syn-100, a bipartite graph G(V, W, E) with |V | = |W| = 100 is randomly generated by adding one edge between v V and w W independently with probability 0.05; the target set O with |O| = 20 is randomly chosen from W. For syn-1000, |V | = |W| = 1000, |O| = 100 and each edge is added with probability 0.01. |