reproducibilityindex.ai

Discovery of the Hidden World with Large Language Models

Authors: Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, Kun Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We construct and curate several synthetic and real-world benchmarks including analysis of human reviews and diagnosis of neuropathic and brain tumors, to comprehensively evaluate COAT. Extensive empirical results confirm the effectiveness and reliability of COAT with significant improvements.
Researcher Affiliation	Academia	Chenxi Liu1 , Yongqiang Chen2,3,4 , Tongliang Liu5,2 1TMLR Group, Hong Kong Baptist University 2Mohamed bin Zayed University of Artificial Intelligence {cscxliu,bhanml}@comp.hkbu.edu.hk {yqchen,jcheng}@cse.cuhk.edu.hk Mingming Gong6,2, James Cheng4, Bo Han1 , Kun Zhang2,3 3Carnegie Mellon University 4The Chinese University of Hong Kong 5Sydney AI Centre, The University of Sydney 6The University of Melbourne tongliang.liu@sydney.edu.au mingming.gong@unimelb.edu.au kunz1@cmu.edu
Pseudocode	Yes	Algorithm 1 The COAT Framework
Open Source Code	Yes	https://causalcoat.github.io
Open Datasets	Yes	Benchmark construction In the Apple Gastronome benchmark, we consider the target variable as a rating score of the apple by astronomers... We generated 200 samples for LLMs analysis and annotation... Neuropathic Semi Real-world Data Tabular samples from Tu et al. [26] on their Git Hub repo under CC-BY 4.0. Textual samples are generated by this paper... Brain Tumor Real-world Data An open Kaggle dataset (kaggle/brain-tumor-classification-mri) with an open-sourced project. [38]. MIT License... Stock News Real-world Data An open Kaggle dataset (kaggle/stock-price-and-news-realted-to-it) [39]. MIT License... ENSO Real-world Data The NOAA 20th Century Reanalysis V3 dataset [40] from their website at https://psl.noaa.gov (with CC0 1.0 License).
Dataset Splits	No	The paper describes using a portion of the data for 'LLMs analysis and annotation' and 'evaluation' in the Stock News case study, indicating a training/testing split, but does not explicitly mention a separate 'validation' dataset split.
Hardware Specification	Yes	We utilized a system comprising two Intel Xeon E5-2630v4 processors with 2.2GHz, two NVIDIA Tesla P40 GPUs, and 256 GB of memory.
Software Dependencies	No	We use a third-party open-sourced Python library to perform the FCI algorithm: https://causallearn.readthedocs.io/en/latest/. We set α = 0.05 , and independence test method="fisherz" hroughout all experiments. Other parameters are kept as the default. While it mentions the `causal-learn` library, it does not specify a version number for this library or for Python itself.
Experiment Setup	Yes	We set α = 0.05 , and independence test method="fisherz" hroughout all experiments. Other parameters are kept as the default. We set the number of clusters to be one plus the number of current factors. The samples in each group are randomly selected to a fixed number (like 3 samples per group).