highCWE-359OWASP LLM02:2025

PII Leakage to AI Models

Your app sends personally identifiable information — emails, names, passwords, phone numbers — to external AI APIs, exposing user data to third-party model providers.

How It Works

When you build AI features, it's tempting to dump full user records into the context window so the model has 'all the information it needs'. But that data leaves your servers and goes to OpenAI, Anthropic, or whoever you're using. Even if they don't train on it, your privacy policy probably doesn't cover this, and a data breach at their end becomes your breach too.

Vulnerable Code

// BAD: sending full user record including PII to the AI
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
const prompt = `Summarize this user's activity: ${JSON.stringify(user)}`;
// user object contains: email, phone, address, payment_method...

Secure Code

// GOOD: send only the non-PII fields needed for the task
const activity = await db.query(
  'SELECT action, created_at FROM audit_log WHERE user_id = $1 LIMIT 20',
  [userId]
);
const prompt = `Summarize this activity log: ${JSON.stringify(activity)}`;

Real-World Example

Several healthcare SaaS companies have faced regulatory scrutiny after sending patient names and symptoms to OpenAI's API without proper BAAs (Business Associate Agreements), violating HIPAA. The fix is data minimization — send only what the model actually needs.

How to Prevent It

Apply data minimization: query only the fields the AI needs, never SELECT *
Anonymize or pseudonymize user data before sending it (replace emails with user_123, etc.)
Review your AI provider's data processing agreements and ensure they cover your use case
Log what data you send to external AI APIs so you can audit it
Never send passwords, payment details, SSNs, or health data to any external model

Affected Technologies

Node.jsPython

Data Hogo detects this vulnerability automatically.

Scan Your Repo Free

Related Vulnerabilities

Prompt Injection

high

User input is concatenated directly into an LLM prompt, letting attackers override your instructions and make the AI do things you never intended.

CWE-77OWASP LLM01:2025

AI Response Without Validation

medium

LLM output is rendered or executed directly without checking whether it matches the expected format or contains harmful content.

CWE-116OWASP LLM02:2025

AI API Key in Frontend

critical

Your OpenAI, Anthropic, or other AI API key is exposed in client-side code, where anyone can steal it and rack up charges on your account.

CWE-312OWASP LLM09:2025

No AI Output Sanitization

medium

LLM-generated HTML or code is rendered directly in the UI without sanitization, opening the door to stored XSS attacks.

CWE-79OWASP LLM02:2025