Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DisenCite: Graph-Based Disentangled Representation Learning for Context-Specific Citation Generation
Authors: Yifan Wang, Yiping Song, Shuai Li, Chaoran Cheng, Wei Ju, Ming Zhang, Sheng Wang11449-11458
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superior performance of our method comparing to state-of-the-art approaches. We further conduct ablation and case studies to reassure that the improvement of our method comes from generating the context-specific citation through incorporating the citation graph. |
| Researcher Affiliation | Academia | 1 School of Computer Science, Peking University, Beijing, China 2 National University of Defense Technology 3 Paul G. Allen School of Computer Science, University of Washington |
| Pseudocode | No | The paper describes its model components and logic using prose and mathematical equations but does not include a formal 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper states 'We release GCite1, a graph enhanced contextual citation dataset... 1https://github.com/jamesyifan/Disen Cite' but does not explicitly state that the source code for the methodology is available at this link. |
| Open Datasets | Yes | We construct a graph enhanced contextual citation dataset GCite, consisting of 25K relationships with different types... over 4.8K papers extracted from computer science domain of S2ORC (Lo et al. 2020). We release GCite1, a graph enhanced contextual citation dataset... 1https://github.com/jamesyifan/Disen Cite |
| Dataset Splits | Yes | We random select 80% of citation relations to constitute the training set, and treat the remaining 10%, 10% as the validation and test set respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Pytorch', 'GRU', and 'Adam optimizer' but does not specify their version numbers. |
| Experiment Setup | Yes | The word embeddings are randomly initialized with dimension d = 50. We limit the input document length to 600 tokens with each section (introduction, method and experiment) less than 200 and citation context length less than 50. For our method, we sample 2 hops of neighborhoods for the target node pair as subgraph with each number of type-specific neighbors are 5 and 4 respectively. The hyper-parameter α = 1, β = 1e 1, γ = 1e 1, and dropout with probability p = 0.35 is employed for all parameters to prevent overfitting. We optimize Disen Cite with Adam optimizer by setting the initial learning rate lr = 5e 3 and uses early stopping with a paticnce of 20, i.e. we stop training if ROUGE-L on the validation set dose not increase for 20 successive epochs. |