XXE — XML External Entity Injection
XML parsers configured to process external entity references, allowing attackers to read arbitrary files from the server or trigger SSRF by crafting a malicious XML payload.
How It Works
XML supports a feature called external entities — references to external files or URLs embedded in the document with `<!ENTITY`. If the parser resolves these, an attacker can submit XML like `<!ENTITY secret SYSTEM 'file:///etc/passwd'>` and the server will read and return that file. Most parsers enable this by default.
// BAD: XML parser with external entities enabled (default)
import { parseStringPromise } from 'xml2js';
export async function POST(req: Request) {
const body = await req.text();
const data = await parseStringPromise(body); // resolves external entities
return Response.json(data);
}// GOOD: disable external entities
import { parseStringPromise } from 'xml2js';
export async function POST(req: Request) {
const body = await req.text();
// xml2js doesn't support external entities, but for parsers that do:
const data = await parseStringPromise(body, {
explicitArray: false,
// If using libxmljs: set noent: false
});
return Response.json(data);
}Real-World Example
XXE was used in the 2019 Facebook breach (via a third-party system) and in numerous enterprise attacks. It's in OWASP Top 10 and commonly found in document upload features (DOCX, SVG, XLSX files are all XML inside).
How to Prevent It
- Disable external entity processing in your XML parser configuration
- Use a modern parser that disables external entities by default (xml2js, fast-xml-parser with default settings)
- Validate that uploaded files are what they claim to be before parsing
- Consider JSON instead of XML for APIs — it doesn't have this attack class
Affected Technologies
Data Hogo detects this vulnerability automatically.
Scan Your Repo FreeRelated Vulnerabilities
LDAP Injection
highUser input inserted into LDAP search filters without escaping, allowing attackers to manipulate directory queries, bypass authentication, or extract sensitive directory data.
HTTP Header Injection (CRLF Injection)
mediumUser-controlled input included in HTTP response headers without sanitization, allowing attackers to inject arbitrary headers or split the response into two separate HTTP responses.
Email Header Injection
mediumUnsanitized user input used in email To, From, CC, or Subject fields, allowing attackers to inject additional recipients and turn your email server into a spam relay.
Log Injection
lowUser-supplied input written to logs without sanitization, allowing attackers to forge log entries, hide their tracks, or inject malicious content into log files.