Made-Up Words Trick AI Text-To-Image Generators

Specially crafted patterns can fool face recognition systems. So one scientist designed special nonsense words to see whether they could trick text-to-image generators.

The Physics arXiv Blog

By The Physics arXiv Blog

Aug 16, 2022 4:00 PM

(Credit:Titima Ongkantong/Shutterstock)

Newsletter

Sign up for our email newsletter for the latest science news

Adversarial images are pictures that contain carefully crafted patterns designed to fool computer vision systems. The patterns cause otherwise powerful face or object recognition systems to misidentify things or faces they would normally recognize.

This kind of deliberate trickery has important implications since malicious users could use it to bypass security systems.

It also raises interesting questions about other kinds of computational intelligence, such as text-to-image systems. Users type in a word or phrase and a specially trained neural network uses it to conjure up a photorealistic image. But are these systems also susceptible to adversarial attack and if so, how?

Today we get an answer thanks to the work of Raphaël Millière, an artificial intelligence researcher at Columbia University in New York city. Millière has discovered a way to trick text-to-image generators using made up words designed to trigger specific responses.

Adverse Consequences

The work again raises security issues. “Adversarial attacks can be intentionally and maliciously deployed to trick neural networks into misclassifying inputs or generating problematic outputs, which may have real-life adverse consequences,” says Millière.

In recent months, text-to-image systems have advanced to the point that users can type in a phrase, such as an astronaut riding a horse, and receive a surprisingly realistic image in response. These systems are not perfect but nevertheless impressive.

Nonsense words can trick humans into imagining certain scenes. A famous example is the Lewis Carroll poem Jabberwocky: “'Twas brillig, and the slithy toves, Did gyre and gimble in the wabe…” For most people, reading it conjures up fantastical images.

Millière wondered whether text-to-image systems could be similarly vulnerable. He used a technique called “macaroni prompting” to create nonsense words by combining parts of real words from different languages. So the word “cliff” is Klippe in German, scogliera in Italian, falaise in French and acantilado in Spanish. Millière took parts of these words to create the nonsense term “falaiscoglieklippantilado”.

To his surprise, putting this word into the DALL-E 2 text-to-image generator produced a set of images of cliffs. He created other words in the same way with comparable results: insekafetti for bugs, farpapmaripterling for butterfly, coniglapkaninc for rabbit and so on. In each case, the generator produced realistic images of the English word.

Millière even produced sentences of these made-up words. For example, the sentence “An eidelucertlagarzard eating a maripofarterling” produced images of a lizard devouring a butterfly. “The preliminary experiments suggest that hybridized nonce strings can be methodically crafted to generate images of virtually any subject as needed, and even combined together to generate more complex scenes,” he says.

A farpapmaripterling lands on a feuerpompbomber, as imagined by the text-to-image generator DALL-E 2 (Source; https://arxiv.org/abs/2208.04135)

Millière thinks is possible because text-to-image generators are trained on a wide variety of pictures, some of which must have been labelled in foreign languages. This allows the made-up words to encode information that the machine can understand.

The ability to fool text-to-image generators raises a number of concerns. Millière points out that technology companies put great care into preventing illicit use of their technologies.

“An obvious concern with this method is the circumvention of content filters based on blacklisted prompts,” says Millière. “In principle, macaronic prompting could provide an easy and seemingly reliable way to bypass such filters in order to generate harmful, offensive, illegal, or otherwise sensitive content, including violent, hateful, racist, sexist, or pornographic images, and perhaps images infringing on intellectual property or depicting real individuals.”

Unwanted Imagery?

He suggests that one way of preventing the creation of unwanted imagery would be to remove any examples of it from the data sets used to train the AI system. Another option is to check all the images it creates by feeding them into an image-to-text system before making them public and filter out any that produce unwanted text descriptions.

For the moment, opportunities to interact with text-to-image generators is limited. Of the three most advanced, Google has developed two, Parti and Imagen, and is not making them available to the public because of various biases it has discovered in their inputs and outputs.

The third system, DALL-E 2, was developed by the Open AI Initiative and is available to limited numbers of researchers, journalists and others. This is the one Millière used.

One way or another, these systems or other similar ones, are bound to become more widely used, so understanding their limitations and weaknesses is important for informing public debate. A key question for technology companies, and more broadly for society, is how these systems should be used and regulated. Such debate is urgently needed.

Ref: Adversarial Attacks on Image Generation With Made-Up Words : arxiv.org/abs/2208.04135

artificial intelligence

1 free article left

Want More? Get unlimited access for as low as $1.99/month

Subscribe

Already a subscriber?

Register or Log In

1 free articleSubscribe

Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In