In 1969, László Bélády and two IBM colleagues published a paging-machine anomaly showing FIFO could make four memory frames suffer ten page faults after three frames suffered nine, leaving generations of operating-systems students staring at the moment more memory became the wrong answer

In 1969, László Bélády and two IBM colleagues published a paging-machine anomaly showing FIFO could make four memory frames suffer ten page faults after three frames suffered nine, leaving generations of operating-systems students staring at the moment more memory became the wrong answer Featured Image

László Bélády’s anomaly begins with a result that still looks like a mistake when students first calculate it: the page-reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 produces nine page faults with three FIFO memory frames, then ten page faults with four.

One extra slot makes the machine miss more often. The desk gets bigger, and the worker walks to the filing cabinet more times.

Bélády, Robert A. Nelson, and Gerald S. Shedler published the result in a 1969 Communications of the ACM paper called “An anomaly in space-time characteristics of certain programs running in a paging machine”. IBM’s own publication record describes the central surprise directly: experiments had revealed cases where decreasing the size of the store also decreased running time.

That is why the anomaly survived its original machines. It is not only a paging curiosity from the mainframe era. It is a small, exact demonstration that a system can become worse after receiving the resource that was supposed to help it.

What FIFO actually does

FIFO means first-in, first-out. In a paging system, it evicts the page that has been sitting in memory the longest, regardless of whether that page is about to be needed again.

Imagine a desk with three folder slots. Every new folder lands on the desk. When the desk is full, the oldest folder gets pushed into a cabinet to make room.

That rule is easy to understand and cheap to track. It is also blind. It remembers arrival order, not usefulness.

Give the desk a fourth slot and common sense says the worker should make fewer trips to the cabinet. Bélády’s example shows the opposite. The fourth slot changes the aging order, and FIFO ends up throwing out a page just before the program asks for it again.

For readers thinking about modern memory pressure rather than old paging machines, Make Tech Easier’s guides to what RAM does in a PC and how Windows pagefile sizing works cover the everyday version of the same speed hierarchy: fast memory is precious, and misses are expensive.

The reference string that broke the instinct

The famous demonstration uses the reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. With three frames under FIFO, the system takes nine faults. With four frames, it takes ten.

A page fault is the moment when the requested page is not in fast memory and has to be fetched from slower storage. In real systems, the cost is not merely arithmetic. It is delay, disk or storage traffic, stalled work, and sometimes a visible pause.

The numbers are small because the example is built for chalkboards. The result is large because it punctures a core engineering reflex: if a machine is missing too often, give it more room.

That reflex is usually right when the replacement rule has the right mathematical property. FIFO does not have it.

Why LRU avoids the anomaly

Modern operating systems generally do not use plain FIFO for paging. They use policies shaped by recency, frequency, and cost, often through LRU-like approximations rather than exact textbook LRU.

LRU means least recently used. It evicts the page that has gone untouched the longest, which is closer to the way many programs actually behave.

The deeper reason LRU matters here is that it belongs to the family of stack algorithms. In a stack algorithm, the pages held by a smaller memory are always contained inside the pages held by a larger memory. Add a frame, and the smaller set does not get rearranged into a worse one.

That containment property blocks Bélády’s anomaly. FIFO lacks it, because adding memory changes the queue order and can change which page becomes the next victim.

Operating-systems courses still teach this because the arithmetic is compact and the warning is permanent. A policy can be simple, fast, and plausible, yet still carry a hidden trap.

The man behind the anomaly

László A. Bélády was born in Budapest in 1928 and trained in mechanical and aeronautical engineering before becoming one of the early figures in virtual-memory research. The IEEE Computer Society’s biography lists his long IBM career at the Thomas J. Watson Research Center and his later leadership roles at MCC and Mitsubishi Electric Research Laboratories.

The Charles Babbage Institute oral-history record notes that Bélády left Hungary after the 1956 revolution, worked in Germany and France, then joined IBM in the United States in 1961. By the mid-1960s, he was working directly on virtual storage and program behavior.

His name was already attached to another major idea before the anomaly paper. In 1966, he published “A Study of Replacement Algorithms for a Virtual-Storage Computer”, the work associated with the theoretical optimal replacement rule often called Bélády’s MIN or OPT.

OPT evicts the page that will not be needed for the longest time in the future. No real operating system can implement it perfectly, because real programs do not disclose their future references in advance, but the rule gives researchers a gold standard for comparison.

Where the lesson still hides

Today’s machines are far removed from the paging machines of the 1969 paper. The IBM Thomas J. Watson Research Center itself became IBM Research’s headquarters in 1961, at the beginning of the era that produced much of this virtual-memory work.

The underlying tradeoff did not disappear. Real caches in operating systems, processors, databases, browsers, and storage systems often use approximations because exact tracking costs time, space, power, or locking overhead.

Linux memory reclaim, for example, is built around LRU-style lists and later refinements, not a literal textbook list of every access. The Linux kernel documentation still describes infrastructure such as the unevictable LRU, because real memory management is full of pages that do not behave like classroom examples.

Database systems make similar compromises. PostgreSQL’s buffer manager is commonly explained through a clock-sweep style replacement policy rather than a pure LRU list, because low overhead matters when many processes are touching shared buffers.

That does not mean every approximation produces Bélády’s anomaly. It means the guarantee has to be earned. Once a designer leaves the clean world of stack algorithms, “more cache” is no longer a proof by itself.

The textbook problem that refuses to retire

Every year, students meet the same sequence and count the misses by hand. The first pass feels routine. The second pass feels wrong.

Three frames should not beat four. The page-fault counter says it does.

That is the power of the example. It compresses an entire engineering discipline into twelve references and two columns of marks: define the policy, run the workload, measure the result, and distrust whatever answer felt obvious before the trace began.

Bélády died in 2021, but the anomaly remains one of the cleanest traps in computer science. Somewhere inside a machine tonight, a cache is choosing what to forget, and the difference between a helpful extra slot and a harmful one is still hidden in the rule that decides what leaves first.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Make Tech Easier Editorial Team Avatar

Read next

When Bell Labs engineer Karl Jansky pointed a rotating antenna at the sky in 1932 looking for sources of transatlantic radio static, he kept picking up a faint hiss that peaked every 23 hours and 56 minutes, and he eventually realized he had become the first human to hear the center of the Milky Way.
The colour magenta does not exist anywhere in the spectrum of visible light, and your brain manufactures it on the spot whenever red and blue cones fire together, inventing a hue to fill a gap that physics never bothered to provide.
On 28 May 2009, Google demoed a product called Wave on stage at I/O for 80 minutes and got a standing ovation from developers who had no idea what they had just watched, and 15 months later the company quietly shut it down because almost nobody could explain to a friend what it was actually for
When Clair Patterson set out in 1948 to measure the age of the Earth using lead in meteorites, his samples kept coming back contaminated, and the seven-year detour he took to find the source ended with him almost single-handedly forcing leaded gasoline out of American cars by 1986.
The IBM 305 RAMAC stayed in production until 1961, weighed more than a ton, stored five million characters on fifty spinning platters, and still drew customers because the alternative was a room full of punched cards
In 1977, Ann Druyan recorded an hour of her brainwaves and heartbeat two days after she and Carl Sagan agreed to marry, and NASA pressed the compressed minute onto Voyager’s Golden Record as a private love signal now more than 25 billion kilometres from Earth
Every Apollo guidance computer that flew to the Moon had its software literally woven by hand at a Raytheon factory outside Boston, where women threaded copper wire through tiny magnetic cores to encode each bit as either a one or a zero, a process the engineers nicknamed LOL memory for Little Old Lady.
When Soviet pilot Viktor Belenko landed a top-secret MiG-25 at a Japanese airport in September 1976, American engineers tearing it down expected to find titanium and microchips, and instead they found vacuum tubes, rivets popped by hand, and a stainless steel airframe so heavy it could only fly fast in a straight line.