Forget More to Learn More

The most important part of learning is not remembering things. It’s forgetting them.

Compress and Forget

The brain doesn’t try to remember everything. Instead, it prizes efficiently. It tries to remember only the most important things. Subconsciously we know that its better to remember a few, vital pieces of information instead of a bunch of inconsequential details. 

Over time, we naturally compress information. We turn our learnings into simple rules of thumb. For example, when we tell stories, We focus on what’s memorable. In reflection, we focus on what’s actionable — the moral of the story. In both instances, we compress information, highlighting the most important details and omitting the rest. 

Memory is biology’s way of compressing the past. But memory is imperfect and subjective. Contrary to popular belief, memory isn’t designed to help us remember the past. But it has tremendous value. Memory is a teacher. Memories help us navigate the future. If you remember that something bad happened, and you can figure out why, then you can try to avoid that bad thing happening again. Memories exist to help us learn from the past, so we don’t make the same mistakes over and over again. 

A subtle, but crucial distinction. 

Culture: Cut and Compress

Compression is everywhere. It’s the process of distilling information down to its essential elements. Compression is the ability to take a large amount of information and explain it in a simple, more beautiful way. Humans are information processing machines, masters at compressing information. 

Compression accelerates our rate of learning. It accounts for why things are interesting. As Tiago Forte once wrote: “The steeper the learning curve — the greater the improvement from the old rule to the new one — the more interesting you find a piece of data.” Learning new things helps us focus more on what’s important and less on what’s irrelevant. 

We condense big ideas into small packages all the time. Sometimes, we compress information consciously. Jeff Bezos compressed his business philosophy into two simple rules:

  1. It’s always Day 1
  2. Be obsessed with the customer

Both rules are clear, simple and memorable. 

Sometimes, compression happens unconsciously. We compress knowledge all the time, even when we’re not aware of it. 

Language is a beautiful example. A complex balance of form and function, it has many rules that we’re unaware of. The more common a word is, the more likely it is to become shorter or abbreviated. In turn, the more it gets abbreviated, the more likely we are to use it. Over time, we stop using inefficient words.

This isn’t the only example. The bible itself is a compressed guide to life. Every story, every verse, every word is significant. Even then, we compressed the entire thing into a simple rule of thumb: “Do unto others as you would have them do unto you.” E=mc2 is an incredible compression. By inventing the formula, Einstein explained a huge amount of our world in a simple and beautiful way.

Lessons from Picasso

Picasso sought compression too. As shown below, he tried to identify the fundamental spirit a bull through a series of progressively simpler, abstract images. 


The last photo resembles a bull, but Picasso could’t have drawn the last image first. Each stage of progression is necessary and sequential. It’s a process of creative destruction. Breaking down, then building up — over and over again, removing what’s unnecessary at each step. Cut, cut, cut, all the way down to the face in the last image. 

Compression is why teaching forces learning. To communicate anything you have to compress it. Teaching forces us to distill what we know and focus on the most important things. In the process of compressing ideas, you grapple with the ideas, which forces you to absorb the good ones and forget the bad ones. In theory, the best ideas and patterns of behavior should rise to the top over time. 

Forgetting: The Most Important Part of Learning

The secrets of compression may rest within deep learning algorithms. These are the algorithms that teach computers how to see. By studying these algorithms, we can learn about the human brain and improve the quality of our thinking. 

Since deep learning algorithms sort through giant data sets, there is a lot of irrelevant information. The algorithms strive to wipe out bad data, while retaining good data. This presents a tradeoff — the same kind that we face every day. It is impossible to compress something without losing some of the context. Thus, the algorithms manage tradeoffs between compression and context. While compression is efficient, information is always lost in the process.Too much compression and they lose context; but too much context and they lose compression.

What’s the optimal tradeoff?

Deep learning algorithms invented a way to separate the signal from the noise. Deep learning algorithms evolve as they learn. New experiments reveal how. Like humans, algorithms optimize for the most relevant features. The algorithm keeps only the most relevant data, measured by how relevant it is to the end goal. 

This deep learning process happens in two phases. The first phase is the fitting phase, where the network learns to label training data. Afterwards, the algorithm moves to the compression phase where the network keeps data with the most general relevance. Then, the algorithm labels the data.

Source: Quanta Magazine

Source: Quanta Magazine

To distill the data and expose the signal, algorithms squeeze large amounts of information through bottlenecks. In the process, only the most relevant features are retained. 

These researchers discovered a fascinating paradox: the most important part of learning is actually forgetting. Algorithms learn more as they forget more. They keep only the most relevant data. Everything else gets discarded. With less data, they become faster and faster over time. 

Whether you’re an algorithm or a conscious being, the information bottleneck is a fundamental learning principle. In their discovery, these researchers discovered a fascinating paradox: the most important part of learning is actually forgetting. Ideally, in its final state, the algorithm achieves the optimal tradeoff of accuracy and compression. 

Dissecting Dreams

Back to evolution and the human body. Dreams help us forge memories. While we sleep, we forget many of the things we learn each day. This helps us remember the most important things — to lift the signal over the noise. 

Biologists at the University of Wisconsin propose that the synapses in our brain get “noisy” during the day. Without this compression process, memories end up fuzzy. Their studies show that in order to learn, we have to grow connections, or synapses, between the neurons in our brains. These connections allow neurons to send signals to one another, quickly and efficiently. 

Forgetting is a feature, not a bug. 

Forget more to learn more.