DNSFilter Chief Data Scientist: Where we're going, we don't need (negative) labels

by DNSFilter Team on Oct 24, 2024 4:00:00 PM

Have you ever tried to build a machine learning classifier where you only had labels for one of the classes?

In computer security, researchers usually have easy access only to labels for malicious samples (malware, phishing domains, etc.), while labels for benign samples (productivity software, e-commerce domains, etc.) are missing entirely—or they are tedious and expensive to collect at scale. Typically, this leads to researchers regarding the “known bad” samples as malicious, while the rest is presumed to be benign.

In recent research published by DNSFilter's Chief Data Scientist, David Elkind, we show that this solution leads to a biased model when compared to an alternative procedure which removes the malicious-but-unlabeled samples from the training set. We show significant improvements in model quality on two different computer security datasets.

Click the button below to read the full research paper. For additional materials, including the code and CAMLIS 2024 poster David presented on October 24, visit GitHub here.

DOWNLOAD NOW

Topics: Media Mention

As Traffic to Threat Websites Continues to Rise, Don’t Overlook the Importance of DNS

At the scale of the Internet, threats are relentless. Domain Name System (DNS) technology is over 40 years old, but it remains just as relevant today—if not more so—to help organizations stay secure from malicious threats. What most people don’t know is that more than 70% of attacks involve the DNS layer. Every malicious request blocked represents a real attack prevented, real harm avoided, and real people protected. This underscores the power of...

Scammers using AI to create fake IRS sites. Here's other scams to watch out for.

Cybersecurity experts expect a significant surge in tax-related scams in the final month before Tax Day.

From Weakest Link to Strongest Defense: Building a Human-Centric Cybersecurity Approach

There's a contradiction in cybersecurity: humans can be both the weakest link and the strongest. For instance, humans are highly susceptible to deception. This is an age-old problem; look no further than the Trojan Horse of Greek lore or the Ghost Army of World War II. In the latter case, Allied forces created inflatable tanks and faked radio traffic, among other deceptive tactics across Europe, to confuse, distract and divert enemy forces and sa...