Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Determining the Number of Latent Factors in Statistical Multi-Relational Learning
Authors: Chengchun Shi, Wenbin Lu, Rui Song
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations and real data examples show that our proposed information criteria have good finite sample properties. Section 4. Numerical Experiments. In Section 4.2, we introduce our algorithm for computing the maximum likelihood estimators of a logistic RESCAL model. Simulation studies are presented in Section 4.3. In Section 4.4, we apply the proposed information criteria to a real dataset. Tables 1, 2, and 3 report numerical results from these experiments. |
| Researcher Affiliation | Academia | Chengchun Shi EMAIL Wenbin Lu EMAIL Rui Song EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695, USA |
| Pseudocode | Yes | 4.1. Implementation In this section, we propose an algorithm for computing {ba(s) i }i and { b R(s) k }k. The algorithm is based upon a 3-block alternating direction method of multipliers (ADMM). ... Applying dual descent method yields the following steps, with l denotes the iteration number: {a(s) i,l+1}n i=s+1 = arg min {a(s) i }n i=s+1 Lρ(...) (11) {R(s) k,l+1}K k=1 = arg min {R(s) k }K k=1 Lρ(...) (12) {b(s) i,l+1}n i=s+1 = arg min {b(s) i }n i=s+1 Lρ(...) (13) v(s) i,l+1 = v(s) i,l + a(s) i,l b(s) i,l , i = s + 1, . . . , n. |
| Open Source Code | No | The paper states: "The ADMM algorithm proposed in Section 4.1 is implemented in R. Some subroutines of the algorithm are written in C with the GNU Scientific Library (GSL, Galassi et al., 2015) to facilitate the computation." However, it does not explicitly state that the authors' implementation code is open-source, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In this section, we apply the proposed information criteria to the Social Evolution dataset (Madan et al., 2012). |
| Dataset Splits | Yes | For any s [1, . . . , 12], we randomly select 80% of the observations and estimate {ba(s) i }i and { b R(s) k } by maximizing the observed likelihood function based on these training samples. Then we compute bπijk = exp{(ba(s) i )T b R(s) k ba(s) j } 1 + exp{(ba(s) i )T b R(s) k ba(s) j } . Based on these predicted probabilities, we calculate the area under the precision-recall curve (AUC) on the remaining 20% testing samples. |
| Hardware Specification | No | The paper mentions that the algorithm is implemented in R and uses C subroutines with GSL, but it does not provide any specific details about the hardware (e.g., CPU, GPU, memory) used for running the experiments or simulations. |
| Software Dependencies | Yes | The ADMM algorithm proposed in Section 4.1 is implemented in R. Some subroutines of the algorithm are written in C with the GNU Scientific Library (GSL, Galassi et al., 2015) to facilitate the computation. |
| Experiment Setup | Yes | In our implementation, we set ρ = n K/2. We simulate {Yijk}ijk from the following model: Pr(Yijk = 1|{ai}i, {Rk}k) = exp(a T i Rkaj) / (1 + exp(a T i Rkaj)), a1, a2, . . . , an iid N(0, 1), R1 = R2 = = RK = diag(1, 1, 1, 1, . . . , 1, 1 | {z } s0 ). We consider six simulation settings. In the first three settings, we fix K = 3 and set n = 100, 150 and 200, respectively. In the last three settings, we increase K to 10, 20, 50, and set n = 50. In each setting, we further consider three scenarios, by setting s0 = 2, 4 and 8. Let smax = 12. In ICα, we set α = 0, 0.5 and 1. |