mediumCWE-79A03:2021

PDF Generation Injection

Injecting HTML or JavaScript into PDF generation templates allows attackers to read server-side files, make internal network requests, or execute scripts in the PDF viewer.

How It Works

Many applications generate PDFs from HTML templates using libraries like Puppeteer, wkhtmltopdf, or WeasyPrint. When user input is inserted into the HTML template without sanitization, attackers can inject malicious HTML and JavaScript. Since the PDF engine renders the HTML on the server, injected scripts execute with server-side context. An attacker can use <script> tags to read local files via XMLHttpRequest('file:///etc/passwd'), make requests to internal services (SSRF), or exfiltrate environment variables. The resulting PDF contains the leaked data, which the attacker then downloads.

Vulnerable Code
const puppeteer = require('puppeteer');
app.post('/invoice', async (req, res) => {
  const { customerName, items } = req.body;
  const html = `<h1>Invoice for ${customerName}</h1>
    <ul>${items.map(i => `<li>${i}</li>`).join('')}</ul>`;
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(html);
  const pdf = await page.pdf();
  res.send(pdf);
});
Secure Code
const puppeteer = require('puppeteer');
const DOMPurify = require('isomorphic-dompurify');
app.post('/invoice', async (req, res) => {
  const name = DOMPurify.sanitize(req.body.customerName);
  const items = req.body.items.map(i => DOMPurify.sanitize(i));
  const html = `<h1>Invoice for ${name}</h1>
    <ul>${items.map(i => `<li>${i}</li>`).join('')}</ul>`;
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setJavaScriptEnabled(false);
  await page.setContent(html);
  const pdf = await page.pdf();
  res.send(pdf);
});

Real-World Example

In 2022, researchers demonstrated SSRF attacks through PDF generation in multiple SaaS platforms. By injecting HTML like <iframe src='http://169.254.169.254/latest/meta-data/'>, they accessed AWS metadata endpoints and extracted IAM credentials from PDF invoices and reports.

How to Prevent It

  • Sanitize all user input with DOMPurify before inserting into HTML templates
  • Disable JavaScript execution in the PDF rendering engine
  • Use text-only PDF libraries instead of HTML-to-PDF converters when possible
  • Block network access from the PDF rendering process using sandboxing

Affected Technologies

Node.jsPythonJavaPHP

Data Hogo detects this vulnerability automatically.

Scan Your Repo Free

Related Vulnerabilities