Why is GPT-4P vulnerable to multi-modal hint injection image attacks?-AI-php.cn

Why is GPT-4P vulnerable to multi-modal hint injection image attacks?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2023-10-30 15:21:17

forward

1444 people have browsed it

OpenAI’s new GPT-4V version supports image uploading, which brings a new attack path, making large language models (LLM) vulnerable to multi-modal injection image attacks. Attackers can embed commands, malicious scripts, and code in images, which the model then complies with.

Multimodal prompt injection image attacks can leak data, redirect queries, generate error messages, and execute more complex scripts to redefine how LLM interprets data. They can repurpose LLMs to ignore previously erected security guardrails and execute commands that could compromise the organization, posing threats ranging from fraud to operational sabotage.

All businesses that use LLM as part of their workflow face difficulties, but those that use LLM as the core of their business for image analysis and classification face the greatest risk. Attackers utilizing a variety of techniques can quickly change how images are interpreted and classified, leading to more confusing results. When LLM's prompts are overwritten, malicious commands and execution scripts are more likely to be ignored. Attackers can commit fraud and operational sabotage by embedding commands in a series of images uploaded to LLM, and can also facilitate social engineering attacks

Images are an attack vector that LLM cannot defend against

Since LLM does not perform data cleaning steps during its processing, each image is unreliable. Just as it is very dangerous to let identities roam freely on the network without access control to every data set, application or resource, there are also dangers for images uploaded into LLM

Enterprise-owned In the case of private LLM, least privilege access must be adopted as a core network security policy

Simon Willison recently explained in detail in a blog post why GPT-4V has become the main avenue for prompt injection attacks, and pointed out that LLM Fundamentally gullible. Blog post link: https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/

Willison shows how to hijack autonomous artificial intelligence agents, such as Auto-GPT, through prompt injection. He explained in detail a simple visual hint injection example that started with embedding a command in a single image and gradually developed into a visual hint injection penetration attack. Paul Ekwere, senior manager of data analytics and artificial intelligence at BDO in the UK, said : "Injection attacks pose a serious threat to the security and reliability of LLM, especially to vision-based models that process images or videos. These models are widely used in areas such as face recognition, autonomous driving, medical diagnosis, and monitoring."

OpenAI currently does not have a solution for multi-modal prompt injection image attacks, and users and enterprises can only rely on themselves. A blog post on the NVIDIA developer website (https://developer.nvidia.com/blog/mitigating-stored-prompt-injection-attacks-against-llm-applications/) provides some suggestions, including for all data storage and The system enforces minimum privilege access

How the multi-modal prompt injection image attack works

The multi-modal prompt injection attack exploits GPT-4V’s ability to process visual images To exploit the vulnerability to execute undetected malicious commands, GPT-4V relies on a visual transformation encoder to convert images into latent space representations. Image and text data are combined to generate responses. The model has no way to clean the visual input before encoding. An attacker can embed any number of commands and GPT-4 will consider them to be legitimate commands. An attacker who automatically performs a multi-modal hint injection attack on a private LLM would go unnoticed.

Containing Injected Image Attacks

Disturbingly, the problem with this unprotected attack vector of images is that an attacker could potentially inject LLM training data into Data fidelity gradually decreases as it becomes less trustworthy over time. A recent research paper (https://arxiv.org/pdf/2306.05499.pdf) provides guidelines on how to better protect LLM from hint injection attacks. To determine the extent of the risk and potential solutions, the team of researchers conducted a series of experiments designed to evaluate the effectiveness of injection attacks against applications that incorporate LLM. The research team found that 31 applications integrating LLM are vulnerable to injection attacks

The research paper makes the following recommendations on curbing injected image attacks:

Improve user Input cleanliness and verification procedures Identity Access Management (IAM) and least privilege access are basic configurations for enterprises that standardize on private LLM. LLM providers need to consider performing a more thorough cleaning before passing image data for processing

What needs to be rewritten is: 2. Improve the platform architecture and separate user input from system logic

The purpose should be to eliminate the risk of user input directly affecting LLM code and data. Any image cues need to be handled so as not to impact internal logic or workflow.

Use a multi-stage processing workflow to identify malicious attacks

We can build a multi-stage process to catch image-based attacks early to better manage this threat

4. Customizing defense prompts to prevent jailbreaking

Jailbreaking is a common prompt engineering technique used to mislead LLM into performing illegal actions. Attaching prompts to malicious-looking image inputs helps protect LLM. . However, researchers warn that advanced attacks can still bypass this approach.

A threat that is gaining momentum

As more and more LLMs transition to multi-modal models, images have become the latest threat vector that attackers can rely on. Used to bypass and redefine protection measures. Image-based attacks vary in severity, from simple commands to more complex attack scenarios designed to cause industrial damage and spread widespread misinformation

This article is sourced from: https: //venturebeat.com/security/why-gpt-4-is-vulnerable-to-multimodal-prompt-injection-image-attacks/. For reprint, please indicate the source

The above is the detailed content of Why is GPT-4P vulnerable to multi-modal hint injection image attacks?. For more information, please follow other related articles on the PHP Chinese website!