Explanations that reveal all through the definition of encoding

Authors: Aahlad Manas Puli, Nhi Nguyen, Rajesh Ranganath

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section consists of two parts. The first part demonstrates the weak and strong detection capabilities of the evaluations ROAR, EVAL-X, and STRIPE-X in a simulated setting and on an image recognition task. To demonstrate these capabilities, we run these evaluations on instantiations of POSI, PRED, and MARG.
Researcher Affiliation Academia Aahlad Puli , Nhi Nguyen , Rajesh Ranganath New York University
Pseudocode Yes Algorithm 1: ENCODE-METER, generative version. Algorithm 2: STRIPE-X, predictive version.
Open Source Code No Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We only use public data. We use standard existing training techniques, describe the hyperparameters in detail, and provided the Llama 3 prompts we used in our experiments.
Open Datasets Yes We consider an image recognition task like the one in Figure 3 with labels and images from the cats_vs_dogs dataset from the Tensorflow package [32]. ... The base cat and dog images were obtained from the cats_vs_dogs dataset from the Tensorflow datasets package.
Dataset Splits Yes The training, validation, and test dataset consist of 8000, 1000, and 1000 samples respectively.
Hardware Specification Yes The cats vs. dogs experiment were done on an A100 GPU where the whole training and evaluation ran in less than 20 minutes. All training and inference for this experiment was done on an A100.
Software Dependencies No The paper mentions using Tensorflow, GPT-2 models, Adam W optimizer, but does not provide specific version numbers for these software components. For example, it doesn't specify "Tensorflow 2.x" or "PyTorch 1.x".
Experiment Setup Yes The EVAL-X model is trained for 100 epochs with a batch size of 100 with the Adam optimizer, with the learning rate and weight decay parameters set to 10 3 and 0 respectively. The pθ(F | xv, ℓ, v) model is trained for 50 epochs with a batch size of 200 with the Adam optimizer, with the learning rate and weight decay parameters set to 5 10 5 and 1 respectively.