Your Customer Data Is Training Someone Else’s AI — Maybe

Last year, a mid-size SaaS company I consulted for discovered that their sales team had been pasting customer contract details into a public AI chatbot for six months. No malice — they just wanted faster email drafts. But that data was being used to train the model, and it included pricing terms, company names, and contact info from over 2,000 accounts. This isn’t rare. A Cisco survey from late 2025 found that 48% of employees admit to entering confidential company data into generative AI tools.

The gap between what AI tools can do with your data and what you think they’re doing is wider than most teams realize. Here’s what’s actually happening behind the scenes — and what you should do about it.

How AI Tools Actually Use Your Data

There are three distinct ways an AI tool might handle the information you feed it:

Training data ingestion. The tool uses your inputs to improve its underlying model. This is the most controversial use. Your data gets mixed into a massive dataset and shapes future responses for all users.

Contextual processing. The tool processes your input to generate a response, then discards it. Your data touches the model temporarily but doesn’t permanently alter it.

Stored data with access controls. The tool saves your data in your account, uses it to personalize your experience, but keeps it isolated from other customers and from model training.

The problem? Most tools don’t make it obvious which approach they use. You have to dig through privacy policies, data processing agreements (DPAs), and sometimes API-specific terms that differ from the consumer product terms.

What the Major CRM and AI Platforms Actually Say

I’ve read through the current data policies of the most common tools showing up in CRM tech stacks. Here’s what they actually commit to — not marketing language, but the contractual terms.

Salesforce Einstein and Agentforce

Salesforce has been the most aggressive about data trust positioning, and to their credit, their policies back it up. Under the Einstein Trust Layer, customer data sent to LLMs is not retained by the model provider and is not used for training. They’ve implemented dynamic grounding and data masking that strips PII before it hits the model.

The key detail: this only applies to Einstein features within the Salesforce platform. If your team copies data out of Salesforce and pastes it into a third-party AI tool, none of those protections apply. I’ve seen this happen at almost every org I’ve worked with.

What to verify: Check that your Salesforce org has the Einstein Trust Layer enabled (it’s on by default for most editions as of Spring ‘26, but older orgs migrated from Classic may need manual activation). Review the “Einstein Audit Trail” in Setup to see exactly what data is being sent to models.

HubSpot AI (Breeze)

HubSpot updated its AI terms significantly in early 2026. Their Breeze AI features — including content generation, predictive lead scoring, and the AI assistant — process data within HubSpot’s infrastructure. They explicitly state that customer CRM data is not used to train their models.

However, there’s a nuance worth understanding. HubSpot does use aggregated, anonymized usage patterns to improve their AI features. This means the structure of how people use the tools influences model development, even if your specific customer records don’t. For most organizations, this is a reasonable trade-off. For those in regulated industries, it’s worth discussing with your compliance team.

What to verify: In your HubSpot portal, go to Settings > AI > Data Preferences. You’ll find toggles for different types of data sharing. The “AI improvement” toggle controls aggregated usage data. Turn it off if your compliance requirements demand it.

Zoho CRM and Zia

Zoho CRM takes a notably privacy-forward stance. They don’t serve ads, don’t sell data, and Zia (their AI assistant) processes data within Zoho’s own servers. They’ve been SOC 2 Type II certified for years and offer data residency options in the US, EU, India, Australia, Japan, and Canada.

The catch with Zoho is that their AI capabilities, while privacy-respecting, lag behind Salesforce and HubSpot in sophistication. You’re trading some functionality for stricter data controls. For teams handling sensitive healthcare, financial, or legal data, that trade-off often makes sense.

What to verify: Confirm your data residency settings in Zoho’s admin panel under Security > Data Hosting. If you’re operating under GDPR, make sure your data center is set to the EU.

OpenAI (ChatGPT and API)

This is the tool most likely being used outside your official CRM stack, by individual reps who find it useful. OpenAI’s policies differ dramatically depending on how you access the tool.

ChatGPT Free and Plus: By default, your conversations can be used to train models. You can opt out in Settings > Data Controls > “Improve the model for everyone.” But here’s the thing — even with this toggle off, OpenAI retains conversations for 30 days for safety monitoring.

ChatGPT Team, Enterprise, and API: Data is not used for training. Period. Conversations are encrypted at rest and in transit, and the retention window for abuse monitoring is shorter. If your org is going to use ChatGPT, the Team tier ($25/user/month as of early 2026) is the minimum for any business handling customer data.

What to do right now: If you don’t have an enterprise OpenAI agreement, assume at least some of your employees are using the free version with customer data. A 2025 Gartner survey found that 68% of CRM users have pasted customer information into a generative AI tool that wasn’t approved by IT.

Microsoft Copilot (Dynamics 365 and M365)

Microsoft’s Copilot for Dynamics 365 operates under their standard commercial data protection terms. Your prompts and outputs are not used to train foundation models. Data stays within your Microsoft 365 tenant boundary, and the same compliance certifications that cover your Outlook and SharePoint data extend to Copilot interactions.

The complexity with Microsoft’s approach is the layered architecture. Copilot sits on top of Azure OpenAI Service, which has its own terms, which run on infrastructure that may span multiple data centers. Microsoft publishes detailed data flow documentation, but it takes effort to map exactly where your CRM data travels during an AI interaction.

What to verify: In the Microsoft 365 admin center, review the Copilot data residency settings. If you’re using Dynamics 365, check that your Azure OpenAI resource is provisioned in the same region as your Dynamics instance.

The Shadow AI Problem Is Your Biggest Risk

Every policy I’ve described above has one fatal flaw: it only applies when people use the approved tool.

I ran an audit for a 200-person sales org last quarter. They had Salesforce Einstein properly configured with all the right data protections. But when I surveyed the team anonymously, 73% of reps admitted to using at least one unapproved AI tool weekly for work tasks. The most common uses:

  • Pasting call transcripts into ChatGPT to generate follow-up emails
  • Uploading spreadsheets of lead data to AI analysis tools
  • Using AI browser extensions that read page content, including CRM records displayed on screen

The fix isn’t banning AI tools — that doesn’t work and kills productivity. The fix is giving people approved alternatives that are just as convenient.

Building an Approved AI Tool Stack

Here’s a practical framework I’ve used with multiple clients:

Step 1: Audit what’s being used. Send an anonymous survey. Ask specifically about AI tools, not just “software.” People don’t think of ChatGPT or browser extensions as “software” the same way they think of Salesforce.

Step 2: Identify the top 3-5 use cases. Usually it’s email drafting, meeting summaries, data analysis, content creation, and research. Don’t try to cover everything — just the high-frequency tasks.

Step 3: Map each use case to an approved tool. For email drafting in a Salesforce shop, that’s Einstein Copilot. For meeting summaries, it might be your existing Zoom or Gong AI features. For general research and drafting, a ChatGPT Team or Enterprise subscription with proper guardrails.

Step 4: Make the approved path easier than the unapproved path. If your reps need to click through four screens to access the AI feature in your CRM but can just open a new browser tab for ChatGPT, they’ll choose the browser tab every time. Reduce friction on the approved tools.

Step 5: Set clear, simple rules. Not a 40-page policy — a one-page reference card. Something like: “You can paste email drafts and general business questions into approved AI tools. Never paste: customer names, contract terms, financial data, or health information. If you’re unsure, ask.”

Compliance Frameworks That Actually Matter

GDPR and AI Processing

If you handle EU customer data, you need a legal basis for any AI processing. Most CRM tools rely on “legitimate interest” for AI features like lead scoring, but you should document this in your Records of Processing Activities (ROPA). Under the EU AI Act’s provisions that took effect in February 2025, AI systems used for profiling individuals (which includes predictive lead scoring) require a human-in-the-loop review process.

Practical step: Add an AI processing section to your ROPA. For each AI feature you use, document: what data goes in, what comes out, which vendor processes it, and where the data is stored.

SOC 2 and AI Vendors

If your organization maintains SOC 2 compliance, your auditors will ask about AI tools starting in your next cycle — if they haven’t already. The AICPA’s 2025 guidance specifically calls out generative AI as a risk area for the Confidentiality and Processing Integrity trust service criteria.

Before your next audit, prepare a list of every AI tool in use (including shadow AI you’ve discovered), their data handling terms, and the controls you’ve implemented. Your auditor will want to see evidence that you’ve evaluated each tool’s data practices, not just that you have a policy on paper.

HIPAA and AI Tools

If you’re a covered entity or business associate, the rules are blunt: don’t put PHI into any AI tool that doesn’t have a signed Business Associate Agreement (BAA). As of early 2026, the tools that offer BAAs for AI features include Salesforce Einstein, Microsoft Copilot for Dynamics 365, and Google Vertex AI (when accessed through Google Cloud with a BAA in place). OpenAI offers BAAs for Enterprise-tier customers.

HubSpot does not currently offer a BAA that covers their Breeze AI features, which is a significant limitation for healthcare organizations using HubSpot as their CRM. If you’re in this situation, you can still use HubSpot for CRM but should disable AI features that process patient-related records.

Data Handling Comparison Table

Here’s a quick reference for the tools I see most often in CRM stacks:

ToolTrains on Your Data?Data Residency OptionsBAA AvailableSOC 2 Certified
Salesforce EinsteinNoUS, EU, multiple regionsYesYes
HubSpot BreezeNo (aggregated usage patterns used)US, EULimited (not AI features)Yes
Zoho ZiaNoUS, EU, IN, AU, JP, CAYesYes
ChatGPT Free/PlusYes (opt-out available)NoNoNo
ChatGPT Team/EnterpriseNoEnterprise: YesEnterprise: YesYes
Microsoft Copilot (Dynamics)NoFollows M365 tenantYesYes
Google Gemini (Workspace)No (paid tiers)Follows Google WorkspaceYes (with Cloud)Yes

Five Things to Do This Week

You don’t need a six-month project to improve your AI data privacy posture. Here’s what you can do in the next five business days:

Monday: Send an anonymous survey to your team asking which AI tools they use for work. Keep it short — five questions max. Include browser extensions and mobile apps.

Tuesday: Review the data control settings in your primary CRM. For Salesforce, check the Einstein Trust Layer configuration. For HubSpot, check Settings > AI > Data Preferences. For Zoho, check Security > Data Hosting.

Wednesday: Read the actual DPA (Data Processing Agreement) for your top three AI tools. Not the marketing page — the legal document. Look for three things: whether your data is used for training, where it’s stored, and what the retention period is.

Thursday: Draft a one-page AI usage guide for your team. Cover what’s approved, what’s not, and the specific types of data that should never go into any AI tool.

Friday: Set up a quarterly AI tool review on your calendar. The policies and features change constantly. What was true six months ago may not be true today. Revisit vendor terms and your internal controls every 90 days.

Keep Your Stack Clean, Keep Your Data Safe

The AI capabilities inside modern CRMs are genuinely useful — they save time on email drafting, surface better leads, and automate the repetitive work that burns out sales teams. But those benefits evaporate if a data breach or compliance violation hits because nobody checked the fine print.

Pick tools with clear data handling commitments, configure them correctly, and make it easier for your team to use the approved tools than to go rogue. For a side-by-side look at how different CRMs handle AI features, check out our CRM comparison page and individual tool reviews to find the right fit for your data requirements.


Disclosure: Some links on this page are affiliate links. We may earn a commission if you make a purchase, at no extra cost to you. This helps us keep the site running and produce quality content.