Data integrity is the guarantee that a pharmaceutical record is complete, accurate, and has not been altered without authorization. That sounds simple enough. The reason the pharmaceutical industry has spent more than a decade and hundreds of warning letters struggling with it is that “guarantee” is harder to engineer than it appears.

This is the foundational article for the data integrity pillar on this site. If you’re new to GxP quality, start here. If you’re an intermediate or advanced reader, the data lifecycle and audit trail design articles go deeper on the mechanics.


What Data Integrity Actually Means

In a regulated pharmaceutical context, data integrity doesn’t mean “the data is right.” It means the data can be proven to be right, and that the system generating, storing, and managing it doesn’t create opportunities for it to be wrong without detection.

The formal definition from the FDA’s December 2018 data integrity guidance is:

“Data integrity refers to the completeness, consistency, and accuracy of data. Complete, consistent, and accurate data should be attributable, legible, contemporaneously recorded, original or a true copy, and accurate (ALCOA).”

The MHRA’s March 2018 GxP Data Integrity Guidance adds that data integrity “requires that data is complete, consistent and accurate throughout the data lifecycle.”

Both definitions are pointing at the same thing: data needs to be trustworthy at every stage of its life, from initial generation on an instrument, through review and approval, to long-term archival. A record that was accurate when generated but was then modified, selectively retained, or stored in a way that makes it inaccessible has lost its integrity.


Why Regulators Care

The short answer: because the data is the evidence.

When FDA evaluates whether a drug is safe and effective, when a quality team decides to release a batch, when a stability study supports a shelf-life claim, all of those decisions rest on data. If the data can’t be trusted, the decision can’t be trusted. And the consequence of a wrong quality decision in this industry isn’t a customer complaint. It’s a patient safety event.

This is why the FDA’s regulatory framework ties data integrity directly to GMP compliance. Under 21 CFR Part 211 (the finished pharmaceutical GMP regulation), requirements for record accuracy appear in multiple sections:

  • 21 CFR 211.68: requirements for computerized systems, including audit trails
  • 21 CFR 211.180: general record retention and accessibility requirements
  • 21 CFR 211.194: laboratory record completeness

These aren’t data integrity requirements in name, but they’re the statutory basis for the FDA’s enforcement actions when data is found to be unreliable. When an inspector finds that audit trails were disabled or that failing test results were deleted, the 483 observation is typically written against 211.68 or 211.194.

The EMA Annex 11, which governs electronic records in EU GMP, and the PIC/S PI 041-1 guidance on data integrity set equivalent expectations for the EU and PIC/S member countries respectively.


The Regulatory Framework in Brief

For someone entering this field, the regulatory landscape can be confusing. Here’s the map:

United States:

  • FDA 21 CFR Part 211, GMP for finished pharmaceuticals
  • FDA 21 CFR Part 11, electronic records and signatures
  • FDA guidance: “Data Integrity and Compliance With Drug CGMP” (Dec 2018)

European Union:

  • EudraLex Volume 4, Annex 11, computerized systems
  • EudraLex Volume 4, Annex 15, qualification and validation
  • EMA questions and answers on data integrity (ongoing)

United Kingdom (post-Brexit):

  • MHRA GXP Data Integrity Guidance (March 2018), the clearest and most explicit national guidance available

International:

  • WHO Technical Report Series 996, Annex 5 (2016), DI guidance for WHO member states
  • PIC/S PI 041-1, harmonized guidance across PIC/S member countries
  • ICH Q10 (Pharmaceutical Quality System), addresses quality culture and data reliability

These documents don’t all say exactly the same thing in exactly the same words, but they converge on the same requirements. An ALCOA+ framework applied consistently will satisfy all of them.


The Cost of Getting It Wrong

This is where it’s worth being concrete, because “regulatory enforcement action” doesn’t quite capture what actually happens to a company that gets a serious data integrity finding.

FDA 483 observations are the inspector’s written findings at the end of a site inspection. A data integrity 483 typically triggers an immediate response requirement, you have 15 business days to submit your written response, and FDA evaluates whether your proposed corrective actions are adequate.

Warning Letters are issued when FDA concludes that a 483 response was inadequate, or when the findings were serious enough to warrant immediate escalation. A warning letter is a public document, it appears on FDA’s website and is searchable. Warning letters typically require a response within 15 business days and trigger ongoing oversight, including potential follow-up inspections.

Import Alerts can be placed on a facility’s products, preventing shipment to the United States. For a company whose primary market is the US, this can halt revenue entirely.

Consent Decrees are the most severe outcome: a legal agreement between FDA and a company that subjects the company’s manufacturing operations to FDA oversight and often requires remediation at the company’s expense, with an independent auditor monitoring compliance. Several major pharmaceutical companies have operated under consent decrees for years.

Beyond regulatory consequences, there are commercial consequences. A warning letter affects supply agreements, partner trust, and investor confidence. A clinical hold tied to data integrity concerns can derail a development program. The cost of a serious data integrity remediation program, including assessments, system upgrades, retraining, and external consultants, routinely runs into eight figures.

None of this is hypothetical. The FDA warning letter database at fda.gov contains dozens of letters citing data integrity as the primary basis for enforcement, spanning every category of pharmaceutical manufacturer.


The ALCOA+ Framework

The practical standard for what makes data trustworthy is captured in the ALCOA+ acronym. It stands for: Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, Available.

Every data integrity requirement in every regulatory guidance maps back to one or more of these principles. The full breakdown is in the ALCOA+ deep-dive article, what each principle means, where programs most commonly fail, and how to assess compliance.

The key insight about ALCOA+: it’s not a checklist to sign off on. It’s a description of what a trustworthy record needs to be. A record that is accurate but not attributable is not trustworthy, you know the number but not who produced it or when. A record that is attributable and accurate but not enduring fails when the system is decommissioned and the data becomes inaccessible.

All nine elements are necessary. None of them alone is sufficient.


Where Data Integrity Programs Most Commonly Fail

After years of watching both well-run programs and failed ones, the failure modes cluster into a few categories. These aren’t company-specific observations, they’re the patterns that appear across the FDA warning letter database and in published inspection literature.

Technical gaps: Systems configured with audit trails disabled or partially enabled. Shared user accounts that destroy attributability. Data stored in proprietary formats without a migration plan. Instrument clocks not synchronized, creating apparent backdating.

Process gaps: Documentation completed after the fact rather than contemporaneously. OOS (out-of-specification) procedures that allow result invalidation without adequate investigation. No defined review process for audit trail entries.

Governance gaps: No inventory of GxP systems and the data each generates. No data lifecycle policy defining retention periods and archival standards. No assigned ownership for data integrity at the system or site level.

Culture gaps: An environment where analysts feel pressure to produce passing results, creating incentives to manipulate or selectively report data. Management that treats data integrity as a paperwork exercise rather than an engineering property of the quality system.

The companies that do this well treat data integrity as a systems property, something built into how instruments are configured, how procedures are designed, and how review is structured, not as a documentation layer added after the fact.


How This Site Is Organized

If you’re new to the field, the recommended reading sequence is:

  1. This article, the regulatory basis and stakes
  2. ALCOA+: The Framework Behind Every Data Integrity Requirement, the nine principles in depth
  3. The Data Lifecycle: From Generation to Archival, where data can go wrong across its full life
  4. Audit Trail Design and Review, the mechanics of the most commonly cited finding type

For deeper coverage, the Computer System Validation pillar covers how these requirements apply to software systems, and the Audits & Inspection pillar covers what an FDA inspector actually does when they sit down to review your data.

The three learning tracks, Foundations, Implementation, and Advanced, provide curated reading sequences if you’re looking for a structured path.