reproducibilityindex.ai

Explanations that reveal all through the deﬁnition of encoding

Authors: Aahlad Manas Puli, Nhi Nguyen, Rajesh Ranganath

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section consists of two parts. The first part demonstrates the weak and strong detection capabilities of the evaluations ROAR, EVAL-X, and STRIPE-X in a simulated setting and on an image recognition task. To demonstrate these capabilities, we run these evaluations on instantiations of POSI, PRED, and MARG.
Researcher Affiliation	Academia	Aahlad Puli , Nhi Nguyen , Rajesh Ranganath New York University
Pseudocode	Yes	Algorithm 1: ENCODE-METER, generative version. Algorithm 2: STRIPE-X, predictive version.
Open Source Code	No	Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We only use public data. We use standard existing training techniques, describe the hyperparameters in detail, and provided the Llama 3 prompts we used in our experiments.
Open Datasets	Yes	We consider an image recognition task like the one in Figure 3 with labels and images from the cats_vs_dogs dataset from the Tensorflow package [32]. ... The base cat and dog images were obtained from the cats_vs_dogs dataset from the Tensorflow datasets package.
Dataset Splits	Yes	The training, validation, and test dataset consist of 8000, 1000, and 1000 samples respectively.
Hardware Specification	Yes	The cats vs. dogs experiment were done on an A100 GPU where the whole training and evaluation ran in less than 20 minutes. All training and inference for this experiment was done on an A100.
Software Dependencies	No	The paper mentions using Tensorflow, GPT-2 models, Adam W optimizer, but does not provide specific version numbers for these software components. For example, it doesn't specify "Tensorflow 2.x" or "PyTorch 1.x".
Experiment Setup	Yes	The EVAL-X model is trained for 100 epochs with a batch size of 100 with the Adam optimizer, with the learning rate and weight decay parameters set to 10 3 and 0 respectively. The pθ(F \| xv, ℓ, v) model is trained for 50 epochs with a batch size of 200 with the Adam optimizer, with the learning rate and weight decay parameters set to 5 10 5 and 1 respectively.