Unreliable Narrators: Hallucinations May Be Causing Your Generative AI Tools To Lie to You

by Alex Applegate on Jul 25, 2023 1:17:33 PM

Artificial intelligence is enjoying a bit of what appears to be a golden era recently, but it’s not always a good thing, and sometimes, it willfully lies to you.

Web-based artificial intelligence tools, particularly Large Language Models (LLMs), such as ChatGPT, have experienced explosive growth in the last year. ChatGPT has grown to over one billion users in the last few months, stable diffusion has become a household word in some circles, hundreds (if not thousands) of tools have emerged that can create a digital picture from a description, create entire songs in minutes, or maybe even just do your homework for you. Just over the span of the last 30 days, DNSFilter has observed a 55% increase in traffic to domains related to “OpenAI” and a 32% increase in traffic to domains related to “ChatGPT”.

Understandably, there has been an equivalent panic. You can not only make digital art or music, but you can train it specifically to impersonate the work of another artist. Even if someone isn’t stealing someone else’s intellectual property, is a song, image, or prose generated by a person using an AI tool actually art? If students can ask “the interwebs” to write their essays, how can a teacher evaluate if the material was understood? It’s a short jump to arrive at concerns over deepfakes and their threat to trusting any form of media in the world around us. I even gave an interview last month about the urgency of preventing the threats of a rogue AI.

The Threat of AI

Not to be ignored is a very real threat that malicious actors can leverage these same AI tools and concepts to analyze, attack, and deceive our networks and web resources, but can even be used adversarially against our cybersecurity defenses, whether human, automated, or artificial intelligence-enhanced.

Each of those concerns are very real, and have intense debate surrounding all of them now. Some venues are banning any AI-generated art, unions are taking fierce stands against the use of AI to create movies, music, voices, and images in commercial spaces. There is a thriving market of AI detection engines that use AI to check to see if essays were written by an AI. Politicians, policy makers, and heads of industry are conducting urgent meetings to figure out how to best legislate the technology, or at least the best way to institute guard rails and ethics guidelines to stay ahead of an emerging industry that’s taken on a life of its own. Everywhere you look, artificial intelligence is both the benevolent savior destined to reshape the world and simultaneously a malevolent and unstoppable force poised to destroy everything it touches, and it’s readily available in every corner of the Internet.

And the concept is truly magical. The model is trained on some massive body of information, say, maybe, huge chunks of the Internet, and it learns everything about everything. If you ask it a question, it’ll give you an answer—a really believable answer, that sounds more and more like a human wrote it every day. And if you don’t like the initial version then you can just keep tuning it and adding parameters until it understands your requirements and it’ll produce something magical, probably better than you could as a human, in next to no time. Life will never be the same ever again.

AI Hallucinations

Except...well, sometimes it misses the bar. Even early on, word began to spread that huge, fully realized studies could be performed unsupervised entirely by the model, unless someone with more than a surface level understanding of the topic actually read the result, and then it sounded much less impressive much of the time. Don’t get me wrong, that’s still unbelievably impressive, but maybe it’s not the panacea we thought it was, at least not yet. But it does get worse.

For example, stories have begun to emerge about the models sometimes making typographical errors. How can a machine learning model that learned how to use and spell a word through the examination of a large number of examples? Realistically, it probably shouldn’t, but the inner workings of such a program are extremely complex, and sometimes the decision making logic follows the wrong branch on the decision tree. Even more strangely, if part of your instructions are to watch out for mistakes, most of the time it will probably catch itself, assuming it doesn’t somehow hit that same snag in logic again. The New Yorker posted a couple of articles earlier this year discussing their analysis of the technical aspects of how this would be possible.

The industry seems to have settled on calling these errors in judgment “hallucinations,” although that may not be much more comforting as a euphemistic anthropomorphism than “glitch.” There are a number of things that could cause such a hallucination. For example, the machinery to run these massive datasets can be very expensive, and companies may choose to cut corners where they can. Or maybe they not only use cheaper machines, but also smaller data sets. But there are very real efforts to attempt to leverage these tools to work smarter and depending on them far too much.

There are too many instances to link, but a search on either side of the education divide will create a deluge of complaints from teachers that all of their students are trying to use ChatGPT to coast in their classes, and just as much of a flood of students complaining that their teachers fed their hard work through an AI detector program and they were threatened with failure because their work scored that it was written by an AI with certainty.

In New York, a lawyer is facing sanctions because his court filings cited 6 cases that were entirely fabricated by ChatGPT and he failed to verify them. A researcher for the Wall Street Journal reported a case where generative AI was asked for a definition for something that didn’t exist, so it made one up and even provided references. The New York Times recently posted about a series of prompts discussing dozens of these inaccurate prompts in an effective exercise in demonstrating how common they are.

And there are more frightening and intentionally malicious risks as well. There is even research into capitalizing on using hallucinations in adversarial generative AI to intentionally deliver you malware when you stumble across something that doesn’t actually exist. There is also research into adversarial inputs designed to force the AI into providing a misclassification based on its training model and data.

Generally speaking, however, the good news is that if you are moderately insistent on checking the output and doing basic fact-checking against the results, an AI hallucination can be relatively easy to detect. Misspellings and links that don’t exist can be mitigated with a little bit of effort. The bigger issue presents itself when a user is legitimately depending upon the AI to produce results in a domain where they don’t have enough knowledge to verify or discredit the results, or in scenarios where a large volume of work is produced and it becomes difficult to comb through the entire response.

These tools have the potential to provide novel education and improved foundations for work and for play, but they are merely tools and should always be used as a starting point, not as a destination. There will almost certainly be cases where AI can be used to perform specific classification tasks and repetitive operations better, faster, and more accurately than humans ever would be capable of, but there always needs to be a principle of trust, but verify.

Topics: Cybersecurity & IT

The Best Content Filter Software Checklist: A Buyer's Guide to DNS-Level Protection

Staying Ahead with Smarter Web Filtering

Across every industry and network environment, content filtering isn’t just a matter of productivity, it’s a front line of defense. From malware and phishing to compliance risks and productivity drains, the threats are real, and the stakes are high.

Smarter DNS Policies: What You Should Be Blocking (But Probably Aren’t)

DNS filtering is a foundational layer of defense and helps to fortify the strongest security stacks. Most organizations use DNSFilter to block the obvious: malware, phishing, and adult content. That’s a great start, but many are missing out on the broader potential of DNS policies.

Educating Your Clients on the Sophistication of Phishing Attacks

Imagine losing $31,583 every minute. That’s how much cybercrime cost American businesses in 2024, according to the FBI’s Internet Crime Complaint Center. Phishing was one of the top threats behind that number. If you're still thinking phishing is just about misspelled emails from a Nigerian prince, you're dangerously underestimating today’s threat.