PII Leakage to AI Models
Your app sends personally identifiable information — emails, names, passwords, phone numbers — to external AI APIs, exposing user data to third-party model providers.
How It Works
When you build AI features, it's tempting to dump full user records into the context window so the model has 'all the information it needs'. But that data leaves your servers and goes to OpenAI, Anthropic, or whoever you're using. Even if they don't train on it, your privacy policy probably doesn't cover this, and a data breach at their end becomes your breach too.
// BAD: sending full user record including PII to the AI
const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]);
const prompt = `Summarize this user's activity: ${JSON.stringify(user)}`;
// user object contains: email, phone, address, payment_method...// GOOD: send only the non-PII fields needed for the task
const activity = await db.query(
'SELECT action, created_at FROM audit_log WHERE user_id = $1 LIMIT 20',
[userId]
);
const prompt = `Summarize this activity log: ${JSON.stringify(activity)}`;Real-World Example
Several healthcare SaaS companies have faced regulatory scrutiny after sending patient names and symptoms to OpenAI's API without proper BAAs (Business Associate Agreements), violating HIPAA. The fix is data minimization — send only what the model actually needs.
How to Prevent It
- Apply data minimization: query only the fields the AI needs, never SELECT *
- Anonymize or pseudonymize user data before sending it (replace emails with user_123, etc.)
- Review your AI provider's data processing agreements and ensure they cover your use case
- Log what data you send to external AI APIs so you can audit it
- Never send passwords, payment details, SSNs, or health data to any external model
Affected Technologies
Data Hogo detects this vulnerability automatically.
Scan Your Repo FreeRelated Vulnerabilities
Prompt Injection
highUser input is concatenated directly into an LLM prompt, letting attackers override your instructions and make the AI do things you never intended.
AI Response Without Validation
mediumLLM output is rendered or executed directly without checking whether it matches the expected format or contains harmful content.
AI API Key in Frontend
criticalYour OpenAI, Anthropic, or other AI API key is exposed in client-side code, where anyone can steal it and rack up charges on your account.
No AI Output Sanitization
mediumLLM-generated HTML or code is rendered directly in the UI without sanitization, opening the door to stored XSS attacks.