Anyone surprised that GenAI includes important sensitive information?
SCWorld.com reported that “Nearly 22% of files and more than 4% of prompts employees send to generative AI (GenAI) tools contain sensitive information, according to an analysis by Harmonic Security published Thursday. Harmonic analyzed 1 million prompts and 20,000 uploaded files sent by workers at companies across the United States and United Kingdom to more than 300 different GenAI and AI-enabled software-as-a-service (SaaS) applications. The prompts and uploads were recorded by the Harmonic Security Browser Extension.” The August 1, 2025 article entitled “AI/ML, Data Security, Application security, Generative AI” (https://tinyurl.com/5hczjb93) included these comments:
The majority of sensitive prompts – 72.6% – went to OpenAI’s ChatGPT, and 26.3% of all sensitive prompts went to a free version of ChatGPT rather than ChatGPT Enterprise.
“Enterprise accounts often mean that security teams get logs of usage, whereas with personal accounts they are flying blind. Personal accounts are often free and that can mean the AI tools are training on input data,” Harmonic Security Vice President Michael Marriott told SC Media.
Harmonic also found that 15.13% of sensitive prompts and files sent to Google Gemini went through free accounts, while 47.42% of the sensitive data uploaded to Perplexity went to standard, non-enterprise accounts.
While the top six AI tools that received sensitive data were ChatGPT (72.6%), Microsoft Copilot (13.7%), Gemini (5%), Anthropic’s Claude (2.5%), Quora’s Poe (2.1%) and Perplexity (1.8%), Harmonic’s analysis found a wide variety of new tools being adopted. The average company saw 23 previously unknown GenAI tools being used by employees for the first time during the data collection period between April and June 2025.
Sensitive files accounted for 13.9% of all data exposure events to AI and 54.9% of files sent to AI tools were PDFs. Files made up 79.7% of credit card exposures, 75.3% of customer profile leaks and 68.8% of employee personal identifiable information (PII) exposures.
The most common type of sensitive data leaked was proprietary code, with this data type most often being sent to ChatGPT, Claude, DeepSeek and Baidu Chat.
No surprises here!