Some internet outages predicted for the coming month as ‘768k Day’ approaches
An internet milestone known as “768k Day” is getting closer and some network administrators are shaking in their boots fearing downtime caused by outdated network equipment.
The fear is justified, and many companies have taken precautions to update old routers, but some cascading failures are still predicted.
What is 768k Day?
The term 768k Day comes from the original mother of all internet outages known as 512k Day.
512k Day happened on August 12, 2014, when hundreds of ISPs from all over the world went down, causing billions of dollars in damages due to lost trade and fees, from a lack of internet connectivity or packet loss.
The original 512k Day took place because routers ran out of memory for storing the global BGP routing table, a file that holds the IPv4 addresses of all known internet-connected networks.
At the time, a large chunk of the internet was being routed through devices that were allocating TCAM (ternary content-addressable memory) large enough to store no more than 512,000 internet routes.
But when on August 12, 2014, Verizon added 15,000 new BGP routes, this caused the global BGP routing table to suddenly go over the 512,000 lines without warning. On older routers, this manifested by the global routing table file overflowing from its allocated memory, crashing the devices every time they attempted to read or work with the file. Companies like Microsoft, eBay, LastPass, BT, LiquidWeb, Comcast, AT&T, Sprint, and Verizon, were all impacted.
Many legacy routers received emergency firmware patches that allowed network admins to set a higher threshold for the size of the memory allocated to handle the global BGP routing table.
Most network administrators followed documentation provided at the time and set the new upper limit at 768,000 — aka 768k.
Global BGP routing table reaching 768,000 limit on older routers
CIDR Report, a website that keeps track of the global BGP routing table, puts the size of this file at 773,480 entries; however, their version of the table isn’t official and contains some duplicates.
A Twitter bot named BGP4-Table, which has also been tracking the size of the global BGP routing table in anticipation of 768K Day, puts the actual size of the file at 767,392, just a hair away from overflowing.
768 Day expected within a month
Both estimate 768K Day happening within the next month.
But unlike many network admins, they don’t expect the event to cause internet-wide outages like in 2014. However, both Glenn and Troutman expect some companies and smaller, local ISPs to be affected.
“I would be mildly surprised if there was any interruption or outage at any real scale,” Glenn told ZDNet. “Ten years ago there was a much wider IP transit market. Now there are a handful of large players that have mostly suitable gear.”
“I don’t expect it to cause ‘massive disruption’ for the internet,” Troutman said, echoing his colleague’s thoughts. “The internet has a lot more resilience and redundancy than most people think.”
“There will certainly be some network operators and corporate end-user organizations who will be caught unaware and will experience problems,” he added.
Some network admins have prepared
The good news is that network admins have known about 768k Day for a long time, and many have already prepared, either by replacing old routers with new gear or by making firmware tweaks to allow devices to handle global BGP routing tables that exceed even 768,000 routes.
“Yes, TCAM memory settings can be adjusted to help mitigate, and even go beyond 768k routes on some platforms, which will work if you don’t run IPv6. These setting changes require a reboot to take effect,” Troutman said.
“The 768k IPv4 route limit is only a problem if you are taking ALL routes. If you discard or don’t accept /24 routes, that eliminates half the total BGP table size.
“The organizations that are running older equipment should know this already, and have the configurations in place to limit installed prefixes. It is not difficult,” Troutman added.
“I have a telco ILEC client that is still running their network quite nicely on old Cisco 6509 SUP-720 gear, and I am familiar with others, too,” he said.
The trick, according to Troutman, is to have ISPs and other network operators using older gear point all their outbound traffic for /24 routes to upstream transit providers, which are most likely running modern gear and will pick it up for their clients.
“If you are affected by 768k you know and have known and done everything you can already,” Glenn said, describing industry efforts to prepare for 768k Day.
Right now, it’s impossible to know how many routers and networks will be impacted on 768k Day, as there’s no Shodan search query that can give us the number and location of vulnerable routers.
But as Glenn told ZDNet, “the Cisco 6500/7600 product line was extremely popular for an exceptionally long time in many, many places,” so don’t be surprised if some networks go offline because they forgot about 768k Day and didn’t prepare.
More tech news coverage: