Why AI Forgets Everything

Florian Tisson

Florian is one of five co-founder from Cobey AI and a qualified entrepreneurship student. At Cobey AI, he characterizes the role of CFO as a way to manage finances sustainably.

Why AI Forgets Everything – and why that's not a bug but a feature

Anyone who has used a chatbot like ChatGPT has noticed this: after a while, the AI suddenly seems to "forget" what was said earlier. This can feel frustrating – but in fact, it is a direct consequence of how Large Language Models (LLMs) are designed.

Limited memory – the context window

Every LLM operates within a context window – essentially its short-term memory. This window defines how much text (measured in tokens) the model can take into account in one session.

Older models like GPT-3 handled around 4,000 tokens. Current versions of GPT-4 process 32,000–128,000 tokens, Anthropic Claude even up to 200,000 – enough to cover multiple novels. Still – once this limit is reached, older parts of the conversation are cut off (Casciato, 2025; Van Droogenbroeck, 2025).

To put this in perspective, a token roughly corresponds to three-quarters of a word in English. This means that a 4,000-token window could handle approximately 3,000 words – roughly the length of a short academic paper. Modern models with 200,000-token windows can process the equivalent of a full-length novel, yet even this substantial capacity has its limits in extended conversations.

The technical reality behind forgetting

When an AI "forgets," it's not experiencing memory loss in the human sense. Instead, the older parts of the conversation are literally removed from the input that gets processed. The model doesn't have access to information that falls outside its context window – it's as if that conversation never happened from the AI's perspective.

This process typically works on a "sliding window" basis. As new messages are added to a conversation, the oldest messages are systematically removed to make room. Some systems employ more sophisticated strategies, such as preserving the initial system instructions and recent exchanges while removing middle portions, but the fundamental constraint remains.

Why not unlimited memory?

The limitation is not a flaw but a conscious trade-off. The computational cost of transformer models grows quadratically with input length. Without a cap, responses would become prohibitively slow and expensive.

Moreover, research like "Lost in the Middle" (Liu et al., 2023) shows that models do not handle long sequences uniformly – they focus more on the beginning and end of the input, while information in the middle tends to be overlooked.

This quadratic scaling means that doubling the context length roughly quadruples the computational requirements. For a company serving millions of users simultaneously, this represents enormous infrastructure costs and energy consumption. Even with the most advanced hardware, there are practical limits to how much context can be processed in real-time while maintaining reasonable response speeds.

The attention mechanism that powers these models also faces inherent challenges with very long sequences. As the context grows, the model must calculate relationships between exponentially more pairs of tokens, leading to what researchers call "attention dilution" – where important connections get lost in the noise of processing vast amounts of information.

Real-world implications of memory limitations

These constraints manifest in various ways that users encounter daily. In customer service applications, an AI might lose track of a complex technical issue that spans multiple exchanges. In educational contexts, tutoring AIs may forget earlier explanations when helping students work through multi-step problems. Creative writing assistants might lose narrative consistency in longer stories, requiring users to repeatedly remind the AI about character details and plot points.

Professional applications feel these limitations acutely. Legal document analysis, medical consultation systems, and business strategy discussions often require maintaining context across extensive dialogues. The current memory constraints mean that such applications often require careful conversation management or external documentation to maintain continuity.

Emerging solutions – from summaries to long-term memory

Researchers and developers are experimenting with multiple strategies:

  • Larger context windows – state-of-the-art models can handle 100,000+ tokens, with some experimental systems pushing toward millions of tokens.
  • Summarization techniques – older chat history is compressed into shorter synopses that preserve key information while reducing token usage.
  • Retrieval-Augmented Generation (RAG) – external knowledge bases store prior content and retrieve relevant pieces into the prompt based on similarity searches.
  • Long-term memory modules such as MemoryBank (Zhong et al., 2023) – they log conversations, build hierarchical summaries, and apply selective forgetting inspired by human memory curves.
  • Hybrid architectures that combine different memory systems, such as maintaining separate stores for factual information, personal preferences, and conversation context.

Some companies are implementing clever workarounds. Vector databases can store conversation embeddings that capture semantic meaning, allowing systems to retrieve relevant past discussions without storing entire transcripts. Hierarchical summarization creates nested summaries at different levels of detail, preserving both broad context and specific details as needed.

Memory vs. privacy

Interestingly, "forgetting" is not always a weakness – sometimes it is a feature. OpenAI (2024) and Google (2025) introduced controllable memory functions, allowing users to decide what is stored, or to opt for temporary "incognito chats" that are deleted automatically. This reflects a delicate balance between technological progress and data privacy.

The privacy implications of AI memory are profound. Persistent memory systems raise questions about data ownership, consent, and the right to be forgotten. Users might share sensitive information assuming it will be forgotten after the session ends, only to discover it has been permanently stored. Conversely, users building long-term relationships with AI assistants may want certain preferences and context to persist.

Different regulatory environments are taking varied approaches to these challenges. European GDPR requirements emphasize user control over personal data, while other jurisdictions focus more on disclosure and consent mechanisms. This regulatory patchwork creates complex challenges for global AI systems that must balance memory capabilities with privacy compliance.

The psychological dimension

The forgetting behavior of AI systems also creates interesting psychological dynamics. Users often develop expectations based on human conversation patterns, where forgetting is gradual and selective rather than sudden and complete. This mismatch can create frustration and reduce trust in AI systems.

However, some users appreciate the clean slate that comes with AI forgetting. Sensitive conversations, personal struggles, or embarrassing moments don't carry forward indefinitely. This creates a unique space where users can experiment with ideas or seek help without fear of long-term judgment or recall.

Current industry approaches

Major AI developers are taking different strategies to address memory limitations. Some focus on expanding context windows through more efficient architectures. Others emphasize external memory systems that can be selectively accessed. Still others are exploring biological metaphors, implementing memory systems that mirror human forgetting curves and prioritization mechanisms.

The competitive landscape around AI memory is intensifying. Companies that successfully solve the memory problem while maintaining speed, cost-effectiveness, and privacy protection may gain significant advantages in applications requiring long-term user relationships.

Outlook

The central question is no longer if AI will have memory, but how. A system that remembers user preferences appears more competent, helpful, and human. At the same time, long-term memory requires careful governance to avoid risks of over-retention.

The future likely holds a spectrum of memory options rather than a one-size-fits-all solution. Different applications will demand different memory architectures – from ephemeral systems for privacy-sensitive tasks to comprehensive memory systems for long-term partnerships. Users may gain fine-grained control over what their AI assistants remember, forget, and prioritize.

We are at a turning point – today's "forgetful" systems are laying the groundwork for tomorrow's consistent, memory-enabled assistants. The challenge lies not just in building systems that can remember, but in building systems that remember wisely, ethically, and in service of human flourishing.

References (selection)

  • Liu et al. (2023): Lost in the Middle: How Language Models Use Long Contexts. arXiv.
  • Zhong et al. (2023): MemoryBank: Enhancing Large Language Models with Long-Term Memory. arXiv.
  • Lauren Goode (2024): OpenAI Gives ChatGPT a Memory. WIRED.
  • Hayden Field (2025): Anthropic's Claude chatbot can now remember your past conversations. The Verge.
  • Ken Metral (2025): Google's Gemini AI now remembers past chats by default. Cosmico.org.
  • Casciato (2025): Warum vergisst ChatGPT …. eurosoft.net.
  • Van Droogenbroeck (2025): Why Your AI Forgets Everything You Say. cduser.com.