Topic Models to Infer Socio-Economic Maps
Authors: Lingzi Hong, Enrique Frias-Martinez, Vanessa Frias-Martinez
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the accuracy of the approaches proposed, we use two datasets: a large-scale spatio-temporal dataset containing one month of calling activity for three cities from the same country, and the socio-economic map for those three cities containing regional SEL information. ... Table 1 shows the results for the four approaches using regression to infer SELs as continuous values. The results reported are for 20 topics for PMB-LDA (RF and SVR) and 25 for PMBSEL-s LDA, which turned out to be the number of topics that had the best results in terms of accuracy (R2) as shown in Figure 2. ... On the other hand, Table 2 shows the accuracies and F1 scores for all four approaches when SELs are defined as three discrete classes: A, B and C (from high to low socio-economic level). |
| Researcher Affiliation | Collaboration | Lingzi Hong College of Information Studies University of Maryland lzhong@umd.edu Enrique Frias-Martinez Telefonica Research Madrid, Spain enrique.friasmartinez@telefonica.com Vanessa Frias-Martinez College of Information Studies University of Maryland vfrias@umd.edu |
| Pseudocode | Yes | Algorithm 1 PMBSEL-s LDA ... Algorithm 2 PMB-LDA ... Algorithm 3 PF |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code for their methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | For privacy reasons involving the use of cell phone data at large-scale, we cannot reveal the name of the country. The spatio-temporal dataset contains a total of 134M calls and 1.8M individuals; while the SEL map contains a total of 186 regions distributed across the three cities. |
| Dataset Splits | Yes | To test the accuracy of this approach, we randomly divide the set of regions in the area under study into a training and a testing set (75% 25%) and repeat it 100 times. We use the training set for PMBSEL-s LDA model estimation and the testing set for the inference of SELs from mobility motifs and topics, and report average accuracy values across all runs. For continuous SEL values, we report R2 and RMSE. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions "Libsvm" and "RFR" (Random Forest Regression) or "SVR" (Support Vector Regression) and "RF" (Random Forest), but it does not specify version numbers for these software components or libraries. |
| Experiment Setup | Yes | For Support Vector Regression, we used a Gaussian RBF kernel and the parameters (C, γ, ϵ) were selected using 5-fold cross validation to minimize the mean squared error. For Random Forest, the results are reported for 8 random trees in PMB-SEL; 146 trees in PF and 14 trees in PF2. |