Share this
The Journey of the Machine Learning Technology that Powers DNSFilter
by Serena Raymond on Apr 27, 2020 12:00:00 AM
In 2018, DNSFilter acquired the company Webshrinker, a company that utilizes machine learning technology to provide website screenshot and domain intelligence API services. We wanted to highlight the journey of how Webshrinker began and how far the technology has come.
How Webshrinker got started
Adam Spotton, now Head of Data Science at DNSFilter, started Webshrinker in 2012 as a screenshot-as-a-service company. Initially, the product was created to fulfill a need for educational use. A provider of curated lists of educational websites needed to provide thumbnails of the website resources but there were thousands of websites to screenshot.
Rather than do work manually or use an incomplete solution (there was only one other screenshot-as-a-service at that time), Adam built Webshrinker to automatically take screenshots of sites that could then be embedded in the educational resource solution, reducing countless hours of manual work.
In 2015, Webshrinker grew as domain categorization was added and became the main driver of the product.
Using cloud computing and machine learning technology, Webshrinker classifies websites into sets of categories by scanning URLs. Users of Webshrinker are able to take the URL-to-category mapping (available to them as either an API or downloadable database) and use it within their software or services.
After a Kickstarter campaign in 2016, DNSFilter became a Webshrinker customer and began integrating Webshrinker’s domain categorization into the DNSFilter product.
Iterating on technology
Webshrinker has gone through many iterations but one of the main transformations has been the technology it uses to simulate a full browser session. Initially, it used Selenium. Then we moved onto a combination of PhantomJS and AWS Lambda. But this caused rendering issues that affected the screenshot API.
Now Webshrinker simulates an actual Chrome browser without using third-party libraries like Selenium.
This is one of the most important aspects of Webshrinker: its ability to imitate an actual person who starts a browser session. This is particularly important when it comes to threat detection.
When Webshrinker processes a page, it opens a browser simulation. If Webshrinker was not programmed to handle browser simulations in a way that mimicked human behavior and a human-initiated web visit, deceptive sites could actually deceive Webshrinker.
In addition to changes to the browser simulator over time, we also regularly pull lists of domains that have recently been flagged as malicious by the system so we can spot-check them for accuracy.
For instance, we might take 400 domains marked as phishing sites and run them through aggregate tools like VirusTotal. (VirusTotal is our standard benchmark because it is a collaboration of over 150 antivirus and domain scanning tools). This gives us an indication of how many false positives (non-malicious sites marked malicious) that Webshrinker is flagging.
This is an example of a VirusTotal spot check done on a Webshrinker-flagged malicious site:
Doing these routine spot checks allows us to see where Webshrinker might be less accurate so we can continue to optimize the technology. As it stands right now, Webshrinker’s false positive rate is at roughly 4%.
Continued improvements
No software product is ever really finished, but especially not machine learning technology. We are actively working on making improvements to Webshrinker that will make it even more accurate. All of the spot checks implemented serve to make Webshrinker a stronger categorization tool.
We are also always optimizing the way that Webshrinker detects threats and feeding it sites to learn from. Doing this, we enable Webshrinker’s machine learning to better sort domains into malicious and non-malicious categories.
Planned improvements include creating additional, more granular categories. Eventually, image analysis will play a larger role in scanning the page during categorization to identify subcategories.
If you have a use for Webshrinker’s services, get in touch with us.
Part 2 to in this series shows how Webshrinker’s website categorization works.
Share this
Categories
- Featured (265)
- Protective DNS (22)
- IT (15)
- IndyCar (9)
- Content Filtering (8)
- Cybersecurity Brief (7)
- IT Challenges (7)
- Public Wi-Fi (7)
- AI (6)
- Deep Dive (6)
- Malware (4)
- Roaming Client (4)
- Team (4)
- Compare (3)
- MSP (3)
- Phishing (3)
- Tech (3)
- Anycast (2)
- Events (2)
- Machine Learning (2)
- Ransomware (2)
- Tech Stack (2)
- Secure Web Gateway (1)
Earlier this month I joined Mikey Pruitt, our Global Partner Evangelist, on the DNSFilter podcast dnsUNFILTERED to discuss my 2025 cybersecurity predictions. We had a lot of fun and covered all of the points I’ll outline here, but I wanted to go deeper. My 30 years of cybersecurity experience have given me a strong sense of where we’re heading as an industry—the shift to the cloud in many ways is a precursor in the adoption of AI and the future...
Most businesses only think about DNS security after an attack has already occurred. By then, the damage is done - downtime, lost revenue, compromised data, and a tarnished reputation. In an environment where cyber threats are constantly evolving, a reactive approach to DNS security simply isn’t enough.
Customer experience is the secret sauce that sets successful Managed Service Providers (MSPs) apart from the rest. In a market teeming with competition, you need to offer more than the best technology or the lowest prices. It's about how clients feel when they interact with your services. A stellar customer experience can transform a one-time client into a loyal advocate, while a poor one can send them running to your competitors. According to a ...