reproducibilityindex.ai

Position: Stop Making Unscientific AGI Performance Claims

Authors: Patrick Altmeyer, Andrew M. Demetriou, Antony Bartlett, Cynthia C. S. Liem

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We probe models of varying complexity including random projections, matrix decompositions, deep autoencoders and transformers: all of them successfully distill information that can be used to predict latent or external variables and yet none of them have previously been linked to AGI. We argue and empirically demonstrate that the finding of meaningful patterns in latent spaces of models cannot be seen as evidence in favor of AGI.
Researcher Affiliation	Academia	1Department of Intelligent Systems, Delft University of Technology, Delft, the Netherlands.
Pseudocode	No	The paper describes methods and processes in prose (e.g., Section 3.1, A.2.2) and uses mathematical formulations, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	All our code will be made publicly available. For the time being, an anonymized version of our code repository can be found here: https://anonymous.4open.science/r/spurious_sentience/README.md.
Open Datasets	Yes	In our first example, we simulate this scenario, stopping short of training the model. In particular, we take the world place.csv that was used in Gurnee & Tegmark (2023b), which maps locations/areas to their latitude and longitude... Closely following the approach in Gurnee & Tegmark (2023b), we apply it to the novel Trillion Dollar Words (Shah et al., 2023) financial dataset... To understand the nature of this low-dimensional projection, we collect daily Treasury par yield curve rates at all available maturities from the US Department of the Treasury.
Dataset Splits	Yes	A hold-out set is reserved for testing, on which we compute predicted coordinates for each sample as \ coord = Ztest W and plot these on a world map (Figure 1)... The training subset contains 3, 374 randomly drawn samples, while the remaining 843 are held out for testing... To account for stochasticity, we use an expanding window scheme with 5 folds for each indicator and layer.
Hardware Specification	Yes	All of the experiments were conducted on a Mac Book Pro, 14-inch, 2023, with an Apple M2 Pro chip and 16GB of RAM.
Software Dependencies	No	The paper mentions specific tools and models like "FOMC-Ro BERTa (a fine-tuned version of Ro BERTa)" and uses "Adam optimizer (Kingma & Ba, 2017)", but it does not specify version numbers for general software dependencies such as Python, PyTorch, TensorFlow, or specific library versions.
Experiment Setup	Yes	The initial feature matrix X(n m) is made up of n = 4, 217 and m = 10 features. We add a total of 490 random features to X to simulate the fact that not all features ingested by Llama-2 are necessarily correlated with geographical coordinates. That yields 500 features in total... The single hidden layer of the untrained neural network has 400 neurons... We use Ridge regression with λ set to 0.1... The encoder consists of a single fully connected hidden layer with 32 neurons and a hyperbolic tangent activation function. The bottleneck layer connecting the encoder to the decoder, is a fully connected layer with 6 neurons... We train the model over 1,000 epochs to minimize mean squared error loss using the Adam optimizer (Kingma & Ba, 2017).