highCWE-359OWASP LLM02:2025

PII Leakage to AI Models

Your app sends personally identifiable information — emails, names, passwords, phone numbers — to external AI APIs, exposing user data to third-party model providers.

How It Works

When you build AI features, it's tempting to dump full user records into the context window so the model has 'all the information it needs'. But that data leaves your servers and goes to OpenAI, Anthropic, or whoever you're using. Even if they don't train on it, your privacy policy probably doesn't cover this, and a data breach at their end becomes your breach too.

Vulnerable Code
// BAD: sending full user record including PII to the AI
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
const prompt = `Summarize this user's activity: ${JSON.stringify(user)}`;
// user object contains: email, phone, address, payment_method...
Secure Code
// GOOD: send only the non-PII fields needed for the task
const activity = await db.query(
  'SELECT action, created_at FROM audit_log WHERE user_id = $1 LIMIT 20',
  [userId]
);
const prompt = `Summarize this activity log: ${JSON.stringify(activity)}`;

Real-World Example

Several healthcare SaaS companies have faced regulatory scrutiny after sending patient names and symptoms to OpenAI's API without proper BAAs (Business Associate Agreements), violating HIPAA. The fix is data minimization — send only what the model actually needs.

How to Prevent It

  • Apply data minimization: query only the fields the AI needs, never SELECT *
  • Anonymize or pseudonymize user data before sending it (replace emails with user_123, etc.)
  • Review your AI provider's data processing agreements and ensure they cover your use case
  • Log what data you send to external AI APIs so you can audit it
  • Never send passwords, payment details, SSNs, or health data to any external model

Affected Technologies

Node.jsPython

Data Hogo detects this vulnerability automatically.

Scan Your Repo Free

Related Vulnerabilities