Share this
Anycast Resolution Latency and Our Commitment to Transparency
by DNSFilter on Mar 23, 2023 4:21:00 PM
Early today at 11:40 a.m. UTC, we detected degraded performance across the DNS2 anycast network. Our team escalated the issue to our hosting provider immediately, and took action to implement a fix by 1:00 p.m. UTC. Performance was fully restored by 1:44 p.m. UTC, and our team continued to monitor the situation. You can review the updates on our status page here.
In the interest of transparency, I wanted to write this article to detail exactly what we experienced to our customers to provide additional information around the incident around this somewhat unique issue.
The complete incident details
At 11:49 a.m. UTC we detected degraded performance on part of our DNS2 anycast network. One of our hosting providers stopped sending our secondary prefixes, pushing the majority of DNS2 traffic to our DNS1 anycast network, which is the initial cause of this degradation.
During the shift from DNS2 to DNS1, much of that traffic shifted to nodes in Copenhagen, Prague, Marseille, and Stockholm. But those nodes could not handle the entire surge from DNS2, and traffic was again rerouted to Sydney and Miami. While this failover mechanism maintained DNS resolution for our customers, it also created latency primarily in central and eastern US time zones. DNS resolution speeds increased at their height to roughly 300ms (3/10 of a second), though the average response time in that window was 11ms.
Since we use our own service internally, we also experienced this incident firsthand. While you might not have noticed the impact if you were browsing a news site at this time, sites that use a lot more dynamic resources may have seemed slow based on the knock-on impacts of slower resolution.
Because we saw this incident occur in real-time, we immediately escalated the issue to our provider and collaborated to resolve the problem. Our hosting provider is also conducting further RCA (root cause analysis) to understand what led to the routing interruption of our secondary prefixes.
Our fully redundant architecture allowed DNS resolution to continue, despite increased latency of resolution time.
Changes we’re making
We are still investigating this incident with our hosting provider, as mentioned above. One thing we’re looking at doing a better job of is decreasing the MTTR (meantime to recovery) for these types of situations. We believe we will resolve these issues significantly faster even when the impact is low.
We are also reviewing internal processes and how we’ve structured our architecture to determine what changes we can make to reduce the impact surface area if an anycast node goes down.
When we built our anycast network, we purposefully created two parallel BGP networks so that if one network had any failures or latencies, the other network would pick up the slack. In one way, this incident was a testament to the success of that strategy; But in another way, this incident will allow us to build further improvements to account for the infinite landscape of problems that come with running a complex global anycast network.
I keep saying transparency
I often correlate the service we provide to oxygen. If we’re controlling the oxygen flow for other companies like ours, we need all of the gauges to report accurately and every tank has to be filled.
Providing our customers with a reliable, high performance service remains a core value of ours. We know that we are an integral part of your technology stack—one that you need to simply work. That’s why we take incidents like this very seriously.
But I also recognize the need to share information when things like this occur. I’m a software user, too. I get impacted by incidents, too. As a technical user, I want answers to why these things occur. That is what we strive to do here: Be honest and responsive when incidents of this type do occur.
We are committed to our customers beyond the product itself. Each of you has chosen to partner with DNSFilter as your DNS resolution and filtering provider, deploying security to your organization via DNS through us. Thank you for choosing us, and we will continue to work hard to ensure that oxygen levels are at full capacity. And if the readings are ever off, we will always let you know.
Visit DNSFilter’s status page for details on this incident.
Share this
Categories
- Featured (267)
- Protective DNS (23)
- IT (15)
- IndyCar (9)
- Content Filtering (8)
- Cybersecurity Brief (7)
- IT Challenges (7)
- Public Wi-Fi (7)
- AI (6)
- Deep Dive (6)
- Malware (4)
- Roaming Client (4)
- Team (4)
- Compare (3)
- MSP (3)
- Machine Learning (3)
- Phishing (3)
- Ransomware (3)
- Tech (3)
- Anycast (2)
- Events (2)
- Tech Stack (2)
- Secure Web Gateway (1)
- Staying Ahead of Cyber Threats (1)
Imagine waking up to find your company's most sensitive data exposed, your systems locked, and your reputation in tatters. This nightmare scenario isn't just a hypothetical—it's the reality for businesses falling victim to zero-day attacks. In 2021, four zero-day exploits targeting Microsoft Exchange servers affected over 250,000 organizations worldwide, leaving countless systems vulnerable to data theft and ransomware.
Ransomware attacks have evolved into one of the most pressing cybersecurity challenges of our time. In these attacks, cybercriminals infiltrate an organization’s network, encrypt critical data, and demand payment—often in cryptocurrency—in exchange for the decryption key. As the frequency of these incidents grows, so do their financial and reputational impacts. From small-to-medium-sized businesses (SMBs) to global enterprises, no one is immune...
Greetings fellow humans! It is now 2025 and while we still don’t have flying cars, we do have self-driving cars—that has got to count for something. Some 2.6 million years ago humans began using tools. Today is a different day because, while we are still using machines as tools, machines have surpassed human ability on three important dimensions: The ability to observe change beyond what is humanly possible, efficacy beyond what is humanly possib...