OWASP A08 Data Integrity Failures Guide

OWASP A08:2021 — Software and Data Integrity Failures was new to the Top 10 in 2021. The reason it made the list that year isn't subtle: the SolarWinds attack in 2020 demonstrated that compromising a build pipeline is a more effective way to attack thousands of targets than attacking each one directly. OWASP responded by creating a category specifically for integrity — the assumption that code, data, and updates haven't been tampered with between where they originate and where your app runs them.

This guide covers what integrity failures look like in practice, real-world attacks that made the category necessary, and the specific code changes that prevent them. No theory-only explanations — every section ends with something you can check or fix today.

What OWASP A08 Actually Covers

The category has three main sub-problems:

Insecure deserialization — parsing user-controlled data into structured objects without validating the structure first. The classic exploit here is crafting a serialized payload that executes code on the server when your app deserializes it. CWE-502 (Deserialization of Untrusted Data) has been in MITRE's Top 25 Most Dangerous Software Weaknesses for years.

Missing integrity verification — loading scripts from a CDN, third-party service, or package registry without checking that the file matches what you expected. If the CDN gets compromised, your users run the attacker's code.

CI/CD pipeline integrity failures — running code in your build pipeline that you haven't verified. Unpinned GitHub Actions, unverified build scripts pulled from external sources, and auto-update mechanisms that don't check signatures all fall here.

The reason these three things live in the same OWASP category is the same underlying pattern: your app trusts something it shouldn't trust blindly. The fix in every case involves some form of verification — hashing, signature checking, or pinning to a known-good state.

What Integrity Failures Look Like in Real Code

Missing Subresource Integrity on CDN Scripts

This is the most common integrity failure in front-end code. You include a library from a CDN with a plain <script> tag. If the CDN is compromised — or if the file is modified in transit — your users run whatever the CDN serves. No error, no warning.

<!-- BAD: No integrity check — the CDN could serve anything -->
<script src="https://cdn.example.com/library.min.js"></script>

<!-- GOOD: SRI hash means the browser rejects anything that doesn't match -->
<script
  src="https://cdn.example.com/library.min.js"
  integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
  crossorigin="anonymous"
></script>

The integrity attribute tells the browser: "compute the hash of the file you actually received, and if it doesn't match this value, don't run it." It's a browser-enforced integrity check. Subresource Integrity (SRI) is the spec name for this feature — it's supported in every modern browser and takes about two minutes to implement. Most CDN providers include the SRI hash in their copy-paste embed code. If yours doesn't, srihash.org generates them for any URL.

Data Hogo's scanner checks for CDN script tags missing the integrity attribute. We've found missing Subresource Integrity in over 60% of front-end repos we've scanned — it's the single most common A08 finding.

Unsafe Deserialization of User-Controlled Data

Deserialization — converting a string or byte stream back into a structured object — is a normal operation. The problem is doing it on data from an untrusted source without validating the result.

In JavaScript/TypeScript, JSON.parse() is technically safe from classic remote code execution exploits (unlike Java or PHP deserialization). But developers often treat a successfully-parsed JSON object as trusted data, skipping validation of its structure and types. That's where the integrity failure happens.

// BAD: parse and use without validation — attacker controls the shape
const config = JSON.parse(req.body.settings);
db.query(`SELECT * FROM data WHERE user_id = ${config.userId}`);
// config.userId could be anything — '1 OR 1=1' included

// GOOD: parse, then validate schema and types with Zod before trusting any value
import { z } from "zod";
 
const settingsSchema = z.object({
  userId: z.string().uuid(),
  theme: z.enum(["light", "dark"]),
  pageSize: z.number().int().min(10).max(100),
});
 
const parsed = JSON.parse(req.body.settings);
const config = settingsSchema.parse(parsed); // throws if shape is wrong
db.query("SELECT * FROM data WHERE user_id = $1", [config.userId]);

The Zod schema acts as a type-level integrity check. You're not just trusting that JSON.parse() produced something — you're verifying that what it produced matches the shape your code actually expects. Learn more about unsafe deserialization patterns and how they appear in modern web apps.

For languages with more powerful serialization (Java's ObjectInputStream, PHP's unserialize(), Python's pickle), the risk is much higher — arbitrary code execution on deserialization is well-documented and actively exploited. The fix there is to use safe serialization formats (JSON, Protobuf) instead of language-native serialization for any data that crosses a trust boundary.

Unpinned GitHub Actions

GitHub Actions workflows pull in external actions using a uses: directive. Most developers pin to a version tag. The problem is that tags are mutable — the action author can move the tag to point at completely different code.

# BAD: 'main' is whatever the action repo's default branch is right now
- uses: actions/checkout@main
 
# ALSO BAD: tags can be moved — v4 today might point to different code tomorrow
- uses: actions/checkout@v4

# GOOD: SHA is immutable — this exact commit hash cannot be silently replaced
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

A compromised GitHub Action runs inside your CI/CD environment with access to your secrets, your source code, and your deployment pipeline. The GITHUB_TOKEN, your cloud credentials, your signing keys — all of it is accessible to code running in your workflow. A malicious action can exfiltrate secrets, modify your build artifacts, or inject code into your deployed application.

Learn more about unpinned GitHub Actions and how to audit your workflows systematically.

The pattern across all three of these: you're trusting something external without verifying its integrity. CDN scripts, deserialized user data, CI/CD actions — same category, same fix pattern.

The Real Attacks That Made A08 Necessary

SolarWinds Orion (2020)

The SolarWinds attack is the clearest real-world demonstration of why OWASP created A08. Attackers — later attributed to the Russian SVR — compromised SolarWinds' software build system. They injected malicious code into a legitimate software update for the Orion platform, a network monitoring tool used by roughly 18,000 organizations including U.S. federal agencies.

The compromised update was signed with SolarWinds' own certificate, passed all integrity checks the recipients had in place, and was distributed through SolarWinds' official update channel. Because organizations trusted the update mechanism, they installed it without question.

The lesson isn't that all update mechanisms are inherently unsafe. It's that trusting a mechanism without verifying what it delivers is an integrity failure — and that build pipelines are high-value targets because compromising them means attacking everyone downstream simultaneously.

Codecov Bash Uploader (2021)

In April 2021, attackers gained access to Codecov's CI/CD infrastructure and modified the bash uploader script that thousands of companies used in their CI/CD pipelines. The modified script exfiltrated environment variables — including cloud credentials, API keys, and tokens — to an attacker-controlled server.

The attack persisted for two months before being discovered. Companies including Twilio, HashiCorp, and Rapid7 confirmed they were affected. The exfiltrated credentials were used for follow-on attacks.

This is a textbook A08 failure: developers curled a script from Codecov's servers and piped it directly to bash, without verifying its integrity:

# BAD: download and execute without any integrity check
curl -s https://codecov.io/bash | bash

# GOOD: download, verify hash, then execute
curl -o codecov.sh https://codecov.io/bash
sha512sum -c <(echo "expectedhash  codecov.sh")
bash codecov.sh

Nobody piping a remote script to bash is thinking about supply chain attacks in the moment. That's exactly the kind of assumption A08 challenges.

3CX Supply Chain Attack (2023)

In March 2023, the 3CX desktop client — a VoIP application with millions of users — was compromised through a supply chain attack. Attackers had first compromised a trading software called Trading Technologies, injected a malicious DLL into it, and an employee of 3CX downloaded and installed that software. The malicious DLL eventually made it into 3CX's build process and was shipped as part of a signed 3CX update.

The 3CX attack demonstrated that supply chain attacks can chain together: compromise vendor A to attack vendor B to attack vendor B's customers. Integrity checks at each link of the chain would have stopped the attack from propagating.

Prevention: What to Actually Do

1. Add SRI to Every External Script

For every <script> or <link> tag loading from a CDN or third-party host:

<!-- Generate the hash from the actual file, not from the URL -->
<script
  src="https://cdn.jsdelivr.net/npm/alpinejs@3.14.3/dist/cdn.min.js"
  integrity="sha384-QQpNPmEhbBnMDFwTfSFBs/OosMH9PJQRTS9B7tHgM8jqTFMxBnS7EFZ+EfcNpIM"
  crossorigin="anonymous"
  defer
></script>

If you're using a bundler (webpack, Vite, esbuild), prefer importing packages through npm rather than loading them from a CDN. You get better version control and the bundler handles integrity implicitly by including the file in your build artifact.

2. Pin GitHub Actions to SHA — Automate It

Manually pinning every action is tedious and error-prone. Use Dependabot for automated SHA pinning updates, or the step-security/harden-runner action to audit your workflow at runtime.

Your .github/dependabot.yml:

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "weekly"
    groups:
      actions:
        patterns:
          - "*"

With this config, Dependabot opens PRs to keep your pinned SHAs up to date when new versions are released. You get immutability without manual maintenance.

3. Validate Everything You Deserialize

Any data that crosses a trust boundary — HTTP request bodies, query parameters, cookies, data from external APIs, items read from a queue — must be validated after parsing and before use.

In TypeScript, Zod is the standard. Define schemas that match exactly what your code expects, parse incoming data through them, and let the thrown ZodError surface as a 400 Bad Request rather than a crash or a logic error.

import { z } from "zod";
 
// Define exactly what shape you expect
const webhookPayloadSchema = z.object({
  event: z.enum(["push", "pull_request", "release"]),
  repository: z.object({
    id: z.number().int().positive(),
    full_name: z.string().max(200),
  }),
  sender: z.object({
    id: z.number().int().positive(),
    login: z.string().max(100),
  }),
});
 
// Parse before using — this throws if the shape is wrong
const payload = webhookPayloadSchema.parse(JSON.parse(rawBody));

4. Verify Checksums for Downloaded Artifacts

Any build step that downloads a binary, archive, or script should verify its checksum before using it:

# Download the binary
curl -LO https://releases.example.com/tool-v1.2.3-linux-amd64.tar.gz
 
# Verify the checksum against the published value
echo "expectedsha256hash  tool-v1.2.3-linux-amd64.tar.gz" | sha256sum -c
 
# Only proceed if verification passed
tar -xzf tool-v1.2.3-linux-amd64.tar.gz

For Docker images, pin to the image digest rather than the tag:

# BAD: the 'latest' tag is updated whenever the maintainer pushes
FROM node:22-alpine
 
# GOOD: the digest is immutable
FROM node:22-alpine@sha256:a4b1c2d3e4f5...

5. Audit Your CI/CD Pipeline for Trust Assumptions

Walk through every step of your CI/CD pipeline and ask: "What am I trusting here without verification?" Common answers include:

External actions pinned to tags instead of SHAs
curl | bash patterns for installing tools
Package installation without lockfile verification (npm install instead of npm ci)
Build artifacts uploaded to registries without signing

The goal isn't paranoia — it's identifying where an attacker could inject code that runs in your pipeline environment.

How Data Hogo Scans for A08

Data Hogo checks for several A08 patterns automatically:

Data integrity failures — general patterns where data is processed without integrity verification
Missing Subresource Integrity — CDN script and stylesheet tags without integrity attributes
Unsafe deserialization — deserialization of user-controlled input without schema validation
Unpinned GitHub Actions — workflow steps using tag references instead of SHA pins

When we scanned 50 public repos during development, unpinned GitHub Actions was the most common finding — present in 78% of repos with GitHub Actions workflows. Missing SRI came second. Most developers simply haven't thought about these as security issues, because the exploitation path isn't as obvious as SQL injection or exposed API keys.

Scan your repo free to see which of these your project has. The scan takes under 5 minutes and covers A08 patterns alongside secrets, dependencies, and code-level vulnerabilities.

The Bigger Picture: CI/CD Is the New Attack Surface

Before OWASP added A08, the security conversation was mostly about the running application — authentication, injection, access control. A08 shifts attention to the supply chain: all the tools, dependencies, and automation that build and deploy the application.

The SolarWinds and Codecov attacks happened in 2020 and 2021. Since then, supply chain attacks have increased significantly. The pattern is consistent: attackers find it more efficient to compromise developer tooling than to attack each application individually.

Your CI/CD pipeline runs with elevated access. It reads your secrets, builds your artifacts, and deploys to production. It runs code from GitHub Actions, npm scripts, Docker images, and downloaded binaries. Every external thing it runs without verification is a potential integrity failure.

The fixes aren't complicated. Pinning a SHA takes 30 seconds. Adding an SRI hash takes two minutes. Validating deserialized data with Zod takes five minutes per endpoint. These are the kinds of low-effort improvements that meaningfully reduce your exposure to an entire category of attacks.

If you want to see where your project currently stands on A08, scan your repo with Data Hogo. You'll have findings in under 5 minutes — including which specific files and line numbers need attention.

Frequently Asked Questions

What is OWASP A08:2021 Software and Data Integrity Failures?

OWASP A08:2021 is a category that covers vulnerabilities where code or data can be modified without any verification. It includes missing Subresource Integrity (SRI) on CDN scripts, unsafe deserialization of user-controlled data, CI/CD pipelines that run unverified code, and auto-update mechanisms that don't verify authenticity. It was new in the 2021 OWASP Top 10, elevated largely because of supply chain attacks like SolarWinds.

What is a supply chain attack and how does it relate to OWASP A08?

A supply chain attack happens when an attacker compromises a dependency, build tool, or third-party script that your application trusts. Instead of attacking your app directly, they corrupt something your app pulls in. OWASP A08 covers this because the core problem is integrity — your app is pulling in code or data without verifying it hasn't been tampered with. The SolarWinds Orion attack (2020) and the Codecov breach (2021) are the most cited real-world examples.

How do I add Subresource Integrity (SRI) to a CDN script tag?

Generate a hash of the script file using sha384 or sha512, then add it as an integrity attribute on your script tag: <script src="https://cdn.example.com/lib.js" integrity="sha384-[hash]" crossorigin="anonymous"></script>. If the file delivered by the CDN doesn't match the hash, the browser refuses to execute it. Most CDN providers show the SRI hash alongside their copy-paste embed code. You can also generate hashes using srihash.org.

Why is pinning GitHub Actions to a commit SHA safer than using a version tag?

A version tag like actions/checkout@v4 is a mutable pointer — the repository owner can move it to point at different code at any time. A commit SHA like actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 is immutable. If the SHA changes, the pipeline fails. This matters because a compromised GitHub Action runs inside your CI/CD environment with access to all your secrets and deployment pipeline.

What is unsafe deserialization and why is it dangerous?

Deserialization is converting stored or transmitted data back into an object your code can work with. Unsafe deserialization happens when you deserialize data from an untrusted source without validating its structure first. An attacker can craft a payload that, when deserialized, executes arbitrary code, bypasses authentication, or corrupts application state. CWE-502 (Deserialization of Untrusted Data) is consistently in MITRE's Top 25 Most Dangerous Software Weaknesses. The fix is to validate the structure of deserialized data with a schema before using any values from it.