Artificial General Intelligence (AGI): Memory

07 May, 2025

In this post, we will discuss the gaps in competencies we observe in AGI today.

If you have not yet read the previous post defining AGI, I highly recommend that you do so before continuing.

Let's begin by sharing some words from top AI scientists, which give a good overview of where we are currently, and where we are headed (emphasis added):

The models today are pretty capable—of course, we've all interacted with the language models, and now they're becoming multimodal models. I think there are still some missing attributes: things like reasoning, hierarchical planning, long-term memory. There's quite a few capabilities that the current systems... [are] not consistent across the board... [T]hey're very very strong in some things but they're still surprisingly weak and flawed in in other areas, so you'd want an AGI to have pretty consistent robust behavior across the board of the cognitive tasks. I think one thing that's clearly missing that I always had as a benchmark for AGI was the ability for these systems to invent their own hypotheses or conjectures about science not just prove existing ones... having that kind of creative, inventive capability. — Demis Hassabis, CEO of Google DeepMind, Nobel Prize Winner

As I said before, if you think that we're going to get to human level AI by just training on more data and scaling up LLM, you're making a mistake... However there are ideas about how to go forward and have systems that are capable of doing what every intelligent animal and human are capable of doing and that current systems are not capable of doing. I'm talking about understanding the physical world, having persistent memory, being able to reason, and plan. Those are the four characteristics that need to be there. — Yann LeCun, Chief Scientist of Meta AI, Turing Award Winner

Reasoning, Planning, Memory

Reasoning, planning, and memory are three areas that both Hassabis and LeCun claim are deficient in existing AGI systems. It's worthwhile to discuss what each of them means. Like "intelligence," these terms are not nicely defined (if they were, we could have had easier progress too). Since the best AGI today are large language models (LLMs), and Hassabis and LeCun had these systems in mind when making their claims, the following discussion is based on LLMs.

Memory

What sort of memory do LLMs have? One way LLMs have memory is the context window. Users can give information for the model to use in that session.. Once you refresh the context window (e.g. start a new chat), you wipe that memory of the LLM clean. Another way LLMs have memory is from its training data. When you ask it some questions, it's referring back to what it has learnt from its training data.

Short-term and Long-term Memory

We think of human memory as short-term and long-term memory. These are helpful analogies for LLM memory. We can say that LLM has short-term memory via the context window, and long-term memory stored as model weights from its training data.

But LLMs do not have long-term memory outside its training, which is considered a flaw. For humans, short-term memory gets moved to long-term memory, and this happens dynamically. For LLMs today, that's not true.

Integrating New Experiences

There is some engineering you can do to mimic that process. For example, one way to give the illusion of long-term memory is to pick up key elements from your past interactions with the LLM and then give that as context to the LLM for your next interaction. You can set up an automated system to do this, but you are still limited by the length of the context window, which is short-term memory. We are using fancy engineering to give the experience of long-term memory using LLM's short-term memory; but we might not consider it the same as having long-term memory, which might look something like updating the weights in the LLM based on its interactions.

The way we influence an LLM's long-term memory today is to do post-training like fine-tuning, which requires human intervention. It isn't autonomously done by the model, unlike how our minds automatically updates its memory and learns based on interactive experience. Human memory is not a passive storage vault but an active, reconstructive process where memory is constantly being updated, reinterpreted, and integrated with new experiences. We are making big strides in letting LLM's autonomously learn from experience, but it will be some time before we crack this kind of memory for AGI.¹

In conclusion, memory remains a critical gap in current LLM capabilities. We've examined how today's models handle memory, explored ongoing efforts to enhance their memory systems, and hinted at what more capable AI memory might look like in the future. In the next posts, I will discuss planning and reasoning.

While not directly on the topic of memory, I find the paper Welcome to the Era of Experience by Google DeepMind useful for framing. It gives another reason we want to learn from experience: We are running out of training data.↩