XXE — XML External Entity Injection

XML parsers configured to process external entity references, allowing attackers to read arbitrary files from the server or trigger SSRF by crafting a malicious XML payload.

How It Works

XML supports a feature called external entities — references to external files or URLs embedded in the document with `<!ENTITY`. If the parser resolves these, an attacker can submit XML like `<!ENTITY secret SYSTEM 'file:///etc/passwd'>` and the server will read and return that file. Most parsers enable this by default.

Vulnerable Code

// BAD: XML parser with external entities enabled (default)
import { parseStringPromise } from 'xml2js';
export async function POST(req: Request) {
  const body = await req.text();
  const data = await parseStringPromise(body); // resolves external entities
  return Response.json(data);
}

Secure Code

// GOOD: disable external entities
import { parseStringPromise } from 'xml2js';
export async function POST(req: Request) {
  const body = await req.text();
  // xml2js doesn't support external entities, but for parsers that do:
  const data = await parseStringPromise(body, {
    explicitArray: false,
    // If using libxmljs: set noent: false
  });
  return Response.json(data);
}

Real-World Example

XXE was used in the 2019 Facebook breach (via a third-party system) and in numerous enterprise attacks. It's in OWASP Top 10 and commonly found in document upload features (DOCX, SVG, XLSX files are all XML inside).

How to Prevent It

Disable external entity processing in your XML parser configuration
Use a modern parser that disables external entities by default (xml2js, fast-xml-parser with default settings)
Validate that uploaded files are what they claim to be before parsing
Consider JSON instead of XML for APIs — it doesn't have this attack class

Affected Technologies

nodejsPythonJavaPHP

Data Hogo detects this vulnerability automatically.

Scan Your Repo Free

Related Vulnerabilities

LDAP Injection

high

User input inserted into LDAP search filters without escaping, allowing attackers to manipulate directory queries, bypass authentication, or extract sensitive directory data.

CWE-90A03:2021

HTTP Header Injection (CRLF Injection)

medium

User-controlled input included in HTTP response headers without sanitization, allowing attackers to inject arbitrary headers or split the response into two separate HTTP responses.

CWE-113A03:2021

Email Header Injection

medium

Unsanitized user input used in email To, From, CC, or Subject fields, allowing attackers to inject additional recipients and turn your email server into a spam relay.

CWE-93A03:2021

Log Injection

low

User-supplied input written to logs without sanitization, allowing attackers to forge log entries, hide their tracks, or inject malicious content into log files.

CWE-117A09:2021