As the ins and outs of the potential of AI chatbots continue to make headlines, the frenzy surrounding ChatGPT remains at a fever pitch. One question that has captured the attention of many in the security community is whether the technology's ingestion of sensitive business data poses risks to organizations. There was concern that if someone entered sensitive information — quarterly reports, internal presentation materials, sales numbers, etc. — and asked ChatGPT to write text around it, anyone could get the company's information simply by asking ChatGPT.
The impact can be far-reaching: Imagine working on an internal presentation that contains new company data that reveals something to be discussed at a board meeting corporate issues discussed above. Leaking this proprietary information out could damage stock prices, consumer attitudes and customer confidence. Worse, legal items on the leaked agenda could expose the company to real liability. But can any of these things really happen just by stuff put into a chatbot?
Research firm Cyberhaven explored this concept in February, focusing on how OpenAI used what people input into ChatGPT as training data to improve its technology, with output that closely resembled what was input. Cyberhaven claims that confidential data entered into ChatGPT could be leaked to third parties if the third party asks ChatGPT certain questions based on information provided by executives.
The UK’s National Cyber Security Center (NCSC) shared further insight into the matter in March, stating that ChatGPT and other large language models (LLMs) currently do not automatically add information from queries to the model for Others inquire. That is, including the information in the query does not result in potentially private data being incorporated into the LLM. "However, queries will be visible to the organization providing the LLM (and in the case of ChatGPT, also to OpenAI)," it wrote.
"These queries have been stored and will almost certainly be used to develop an LLM service or model at some point. This may mean that the LLM provider (or its partners/contractors) is able to read the queries and They may be incorporated into future releases in some way," it added. Another risk, which increases as more organizations produce and use LLMs, is that queries stored online could be hacked, leaked or accidentally made public, the NCSC writes.
Ultimately, there are real reasons to be concerned about sensitive business data being entered and used by ChatGPT, although the risk may not be as widespread as some headlines make it out to be.
LLM exhibits a type of emergent behavior called situated learning. During a session, when the model receives inputs, it can perform tasks based on the context contained in those inputs. "This is most likely the phenomenon people are referring to when they are concerned about information leakage. However, it is impossible for information from one user's session to be leaked to another user," Andy Patel, senior researcher at WithSecure, told CSO. "Another concern is that prompts entered into the ChatGPT interface will be collected and used for future training data."
While concerns about chatbots ingesting and then regurgitating sensitive information are valid, Patel said A new model needs to be trained to integrate this data. Training an LLM is an expensive and lengthy process, and he said he would be surprised if a model could be trained on the data collected by ChatGPT in the near future. "If a new model is eventually created that contains collected ChatGPT hints, our fear turns to membership inference attacks. Such attacks have the potential to expose credit card numbers or personal information in the training data. However, there are no targets for supporting ChatGPT and others like it. The system's LLM proves membership inference attacks." This means that future models are extremely unlikely to be vulnerable to membership inference attacks.
Wicus Ross, senior security researcher at Orange Cyberdefense, said the issue is most likely caused by an outside provider that doesn’t clearly state its privacy policy , so using them with other security tools and platforms could put any private data at risk. “SaaS platforms such as Slack and Microsoft Teams have clear data and processing boundaries, and the risk of data exposure to third parties is low. However, if third-party plug-ins or bots are used to enhance the service, whether they are related to artificial intelligence or not, associated, these clear lines can quickly become blurred,” he said. "In the absence of an explicit statement from the third-party processor that the information will not be disclosed, you must assume that it is no longer private."
Neil Thacker, EMEA chief information security officer at Netskope, told CSO that in addition to the sensitive data shared by regular users, companies should also be aware of prompt injection attacks that could reveal previous instructions provided by developers when adjusting tools, or Causes it to ignore previously programmed instructions. "Recent examples include Twitter pranksters changing the bot's behavior and an issue with Bing Chat, where researchers found a way for ChatGPT to reveal instructions that were previously supposed to be hidden, possibly written by Microsoft."
According to Cyberhaven, sensitive data currently accounts for 11% of content posted by employees to ChatGPT, and the average company leaks sensitive data to ChatGPT hundreds of times a week. “ChatGPT is moving from hype to the real world, and organizations are trying to implement actual implementations in their operations to join other ML/AI-based tools, but caution needs to be exercised, especially when sharing confidential information,” Thacker said. “All aspects of data ownership should be considered, as well as what the potential impact would be if the organization hosting the data were breached. As a simple exercise, information security professionals should at least be able to identify the data that might be accessed if these services were breached category."
Ultimately, it is the business's responsibility to ensure that its users fully understand what information should and should not be disclosed to ChatGPT. The NCSC said organizations should be very careful about the data they choose to submit in prompts: "You should ensure that those who want to try LLM can, but do not put organizational data at risk."
However, Cyberhaven warns that identifying and controlling the data employees submit to ChatGPT is not without its challenges. "When employees enter company data into ChatGPT, they don't upload the file, but instead copy and paste the content into their web browser. Many security products are designed around protecting files (marked as confidential) from being uploaded , but once the content has been copied from the file, they cannot track it," it reads. Additionally, Cyberhaven said corporate data that goes into ChatGPT often does not contain identifiable patterns that security tools look for, such as credit card numbers or Social Security numbers. “Today’s security tools can’t differentiate between a person typing a cafeteria menu and a company’s merger and acquisition plan without understanding its context.” To improve visibility, Thacker said, organizations should add more features to their secure web gateways Implement policies on the (SWG) to identify the use of AI tools, and also apply data loss prevention (DLP) policies to identify what data is submitted to these tools.
Michael Covington, vice president of portfolio strategy at Jamf, said organizations should update their information protection policies to ensure the types of applications that are acceptable for handling confidential data are properly documented. “Controlling the flow of information starts with well-documented and informed policies,” he said. "Additionally, organizations should explore how they can leverage these new technologies to improve their businesses in thoughtful ways. Rather than shying away from these services out of fear and uncertainty, invest some people in exploring new tools that show potential so you can Understand the risks early and ensure adequate protection is in place when early end-user adopters want to start using these tools”
The above is the detailed content of Sharing sensitive business data with ChatGPT may be risky. For more information, please follow other related articles on the PHP Chinese website!