Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Compressive-Expressive Communication Framework for Compositional Representations

Authors: Rafael Elberg, Felipe del Rio, Mircea Petrache, Denis Parra

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method significantly improves both the efficiency and compositionality of the learned messages on the Shapes3D and MPI3D datasets, surpassing prior discrete communication frameworks in both reconstruction accuracy and topographic similarity.
Researcher Affiliation	Academia	Rafael Elberg Pontificia Universidad Católica, CENIA, i-Health Chile EMAIL Felipe del Rio Pontificia Universidad Católica, CENIA Chile EMAIL Mircea Petrache Pontificia Universidad Católica, CENIA Chile EMAIL Denis Parra Pontificia Universidad Católica, CENIA, i-Health Chile EMAIL
Pseudocode	No	The paper describes methods and formulas but does not include any explicitly labeled pseudocode or algorithm blocks. The procedural descriptions are given in paragraph form.
Open Source Code	Yes	Our oficial implementation can be found at https://github.com/Sugar Free Manatee/CELEBI
Open Datasets	Yes	Shapes3D We tested our framework using the Shapes3D [50] dataset, which consists of colored images of 3D geometric shapes... MPI3D We also evaluated using the compositional split of the MPI3D dataset [23].
Dataset Splits	Yes	For both datasets we use the compositional split from Schott et al. [62], which ensures all attribute values appear in training, yet some combinations are reserved for the testing set
Hardware Specification	Yes	All experiments were conducted on our internal laboratory cluster, using NVIDIA A40 GPUs with 48 GB of VRAM. Each job was allocated a single GPU with 8 GB of VRAM usage on average, alongside access to a CPU with 20 cores, 118 GB of RAM, and local SSD storage for datasets and model checkpoints.
Software Dependencies	Yes	Sender and Receiver are implemented using the EGG framework [32]. Input images are encoded into latent representations using a pretrained VAE visual backbone. The VAE is implemented using the disentanglement_lib [47] python library... we conducted permutation tests using the Sci Py library [67].
Experiment Setup	Yes	Sender and Receiver: Both consist of a single-layer LSTM with an embedding size of 64 and a hidden size of 256. Outputs are passed through a two-layer MLP with ReLU activations. Communication Channel: The vocabulary size is set to \|V \| = 15 and messages are composed of C = 10 discrete symbols. VAE Training: The VAE is pretrained for 15 epochs using the Adam optimizer with a learning rate of 1 × 10−3. The interaction-imitation games were run for a maximum of 100 iterations, alternating one full epoch in each phase per iteration. We used early stopping based on the validation MSE of the final reconstruction, starting from epoch 5, with a minimum δ of 1 × 10−3 and patience of 5. We used the Adam optimizer with default parameters and learning rates of 1 × 10−3 and 1 × 10−4 for Shapes3D [6] and MPI3D [23], respectively... Therefore, we set the Interaction and Imitation batch sizes to 256 and 512, respectively.