reproducibilityindex.ai

Unsupervised Legal Evidence Retrieval via Contrastive Learning with Approximate Aggregated Positive

Authors: Feng Yao, Jingyuan Zhang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Yun Liu, Weixing Shen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train our models on tens of thousands of unlabeled cases and evaluate them on a labeled dataset containing 919 cases and 4, 336 queries. Experimental results indicate that our approach is effective and outperforms other state-of-the-art representation and retrieval models. The dataset and code are available at https://github.com/yaof20/LER.
Researcher Affiliation	Collaboration	Feng Yao1, Jingyuan Zhang2, Yating Zhang2, Xiaozhong Liu3, Changlong Sun2, Yun Liu1, Weixing Shen1* 1School of Law, Institute for AI and Law, Tsinghua University, Beijing, China 2DAMO Academy, Alibaba Group, Hangzhou, Zhejiang, China 3Worcester Polytechnic Institute, MA, USA
Pseudocode	No	The paper describes methodologies using equations and textual explanations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	The dataset and code are available at https://github.com/yaof20/LER. To motivate other scholars, the dataset and code are publicly available.
Open Datasets	Yes	To facilitate the research of LER, we propose a large-scale dataset named LERD, consisting of more than 300k fact queries and over 11 million query (fact) and candidate (evidence) pairs, within which 4, 436 queries and their corresponding 234, 693 candidates are annotated with the relevance ranking scores. The dataset and code are available at https://github.com/yaof20/LER.
Dataset Splits	Yes	in the unsupervised setting, we use LERD-usp for training and split LERD-sup into valid and test sets for evaluation. And we split LERD-sup into train, valid, and test sets for the supervised experiments. The statistics of the data splits in both settings are shown in Table 2.
Hardware Specification	Yes	We train SWAP on 1 Tesla-A100 80G GPU with a batch size of 8 and optimize the model with Adam W with a learning rate of 1e-5, 10% steps for warmup and 5 epochs.
Software Dependencies	No	The paper mentions specific models like RoBERTa and optimizers like Adam W, but it does not specify the versions of general software dependencies such as Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	During the training stage for SWAP, we use case-level examples to retain the structure information of each case in the mini-batch. We randomly sample cases from the training data and set the maximum input length of facts and evidence to 128 tokens. ... We train SWAP on 1 Tesla-A100 80G GPU with a batch size of 8 and optimize the model with Adam W with a learning rate of 1e-5, 10% steps for warmup and 5 epochs. The temperature hyper-parameter τ is 0.1. The dense representations of the facts and evidence are obtained by the average pooling strategy. We use cosine similarity as the function to measure the similarity between the fact and evidence representations.