Neural networks (including transformers) are not lookup tables.

2 min readApr 8, 2023

Neural networks (including transformers) are not lookup tables. They generalize using a small amount of parameters in ways that lookup tables cannot. A lookup table to determine what word comes next in any given sentence would need to be far bigger than size of the neural networks that are currently in use. I wrote a tutorial to help explain exactly what transformers do; they help the model to represent words in a way that takes the entire sentence into account, rather than just that one word. (https://www.google.com/url?q=https://medium.com/@london-lowmanstone/a-simpler-description-of-the-attention-mechanism-573e9e7f42b0&sa=D&source=docs&ust=1680928940381820&usg=AOvVaw29hYg8T8Uc8ygtw_90Wi1I)

I like the analogy of the model being the room including the person inside with regards to "no part of the model understands what it is doing". We're not sure that that is the case, but it seems likely given what we've observed in smaller models. However, I don't think it's correct in terms of the vastness of the room you would need in order to provide "lookup" capabilities.

When I mentioned the organization of the information, I wasn't talking about the training process. (I'm a second-year Ph.D student in natural language processing - I know how it works.) I was talking about how the information is organized inside of the LLM. In other words, does the transformer mechanism allow the model to intelligently organize the information in an intelligent way, such that in order to predict the next words, all it has to do is the matrix multiplication?

I think your final point that the training process doesn't seem all that smart, that it's just predicting the next word, is actually further evidence for my point, that ChatGPT, in needing to do really well during that process, has emerged with intelligence that's beyond a lookup table.

What are your thoughts on this?

Written by London Lowmanstone

Responses (1)