Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Conformal Prediction Sets for Ordinal Classification
Authors: Prasenjit Dey, Srujana Merugu, Sivaramakrishnan R Kaveri
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on both synthetic and real-world datasets demonstrate that our method outperforms SOTA baselines by 4% on Accuracy@K and 8% on PS size. |
| Researcher Affiliation | Industry | Prasenjit Dey EMAIL Srujana Merugu EMAIL Sivaramakrishnan Kaveri EMAIL Amazon, India |
| Pseudocode | No | The paper describes methods and constructions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code or links to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate COPOC on four public image datasets: age-detection (Adience [13]), historical image dating (HCI [30]), image aesthetics estimation (Aesthetic [37]) and biomedical classification (Retina-MNIST [44]). |
| Dataset Splits | Yes | For each of these datasets, we split the data into train, calibration, and test sets. We use calibration set to calibrate APS and report mean and standard deviation (std. error) on the test set across 5 independent splits. |
| Hardware Specification | No | The paper describes model architectures (e.g., VGG-16, 6 layer DNN) but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and common ML practices, but does not specify any software libraries (e.g., PyTorch, TensorFlow) or their specific version numbers required for replication. |
| Experiment Setup | Yes | We trained models for 50 epochs with a batch size of 64. For optimization, Adam optimizer was utilized with a learning rate of 0.0001, with decay rate of 0.2. |