Generative AI Tools and Data Leaks: Preventing Data Loss

Among the many cybersecurity concerns emerging from the proliferation and advancement of generative AI tools like ChatGPT, data leakage is a risk that somewhat went under the radar. However, a couple of incidents lately have caused companies to take the threat of these tools more seriously and reconsider whether employees should even be permitted to use them at all. This article discusses data leaks in generative AI tools and offers some data loss prevention tips if your business allows employees to use ChatGPT and other similar services. 

Samsung ChatGPT Data Leak  

The incident that really set the wheels in motion about this risk was a decision by Samsung in May 2023 to ban staff from using ChatGPT and AI chatbots on company-owned devices and internal networks. The decision came after several employees revealed sensitive company information to the chatbot.  

This data leak involved employees sharing source code with ChatGPT and asking the tool to check for errors. Bearing in mind that OpenAI saves all questions, prompts and queries it receives to improve the ChatGPT models, sharing source code to ChatGPT represents the unintentional sharing of confidential, proprietary or sensitive information. Another staff member uploaded the recorded details of an internal meeting.  

Interestingly, OpenAI released an enterprise-grade version of ChatGPT a few months after the Samsung data leak. The marketing copy for this version specifically refers to enterprise-grade privacy and security by emphasizing how it does “not train on your business data or conversations, and our models don’t learn from your usage.” 

Other Companies That Have Banned Generative AI 

In light of this new security-focused version of ChatGPT, has its release stopped companies from adopting similar policies to Samsung? The answer appears to be…not really. Study findings released in September 2023 found that 75% of 2,000 surveyed companies are implementing or considering bans on generative AI tools in the workplace. Data security, privacy and brand reputation were cited as key factors in the bans.  

A whole host of other big-name companies have also banned the likes of ChatGPT in the workplace. Among these organizations are Apple, Amazon, Verizon and JP Morgan.  

Data Leak Risks in Generative AI

So, here we have some scenarios where companies are worried about employees sharing confidential or sensitive information with generative AI tools. But what exactly are the risks that underpin these concerns? After all, ChatGPT is not sentient or advanced enough to autonomously do anything with the data.  

However, there is somewhat of a gray area regarding how the information stored from user queries might get used or who might have access to it. In the worst-case outcome, sensitive company data shared in an input (i.e., a question or prompt) could potentially be presented as a future output (an answer) to a user outside that company by a generative AI tool. 

Much of the risk stems from the wide variety of use cases for generative AI and the lack of inherent security-conscious thinking among employees in relation to the risks posed by these tools. In a business context, some of the things employees might do with ChatGPT include: 

  • Sharing proprietary source code to check for mistakes  
  • Uploading internal meeting notes and asking the tool to summarize them 
  • A CEO asking the tool to rewrite an important email to the board of directors for clarity

In some cases, these data leaks could potentially result in compliance headaches and penalties. It’s not that employees are solely to blame; it’s more the lack of security training and awareness about safely using emerging technologies that continue to become more powerful by the year. 

 Data Loss Prevention & Generative AI 

Some decision-makers at high-profile companies have elected to outright ban ChatGPT. This decision is understandable, but it also misses out on the many productivity benefits that ChatGPT can bring to businesses. So, what are some ways to prevent data loss and leaks while reaping the benefits of generative AI? 

  • Collaborate on and write a clear and robust security policy about how employees can use generative AI tools. Describe in this policy what types of inputs are allowed, restrictions on the size of inputs and who can access such tools.  
  • Ensure effective ongoing security training at all company levels so everyone knows what constitutes sensitive information and how to recognize it. All business users should be aware of the risks of sharing confidential or sensitive information, even with seemingly innocuous tools. Organize workshop sessions to highlight the risks of sharing sensitive data with AI tools.  
  • Conduct regular audits of tool usage to ensure compliance with company policies and to detect any potential data leakage. 
  • Use DLP software to monitor, control and block data transfers that violate company policies. Ensure that your chosen DLP solution can monitor cloud and web traffic, including interactions with AI models online. 
  • If feasible, deploy AI models locally within your company’s infrastructure. This minimizes data transmission outside the organization and provides greater control over data access and processing. 
  • Make sure you have adequate endpoint security to log activities on user devices for auditing purposes if there’s a suspicion of data leakage. Endpoint security also helps with remote wiping or device encryption in cases of device loss or theft, where outsiders might be able to access sensitive AI chat histories. 

To Ban or Not?  

Ultimately, it seems that a large proportion of businesses are so concerned about AI-related data leaks that they’re opting to ban these tools. The judgment call here is quite nuanced, especially considering a wide variety of these types of tools continue to pop up, which could eventually make banning a futile exercise.

Whether you decide to allow generative AI or not, robust endpoint security is vital in mitigating a large number of cybersecurity incidents like data leaks that occur on user devices like workstations, laptops, and BYOD mobiles.

Have you registered for our next event?