A data governance framework is the organizational layer that makes a data integrity program sustainable. You can build every technical control correctly, complete audit trails, individual user accounts, validated systems, and still have a fragile program if no one owns the problem at a system level, if there’s no process for assessing new systems before they go into GxP use, and if there’s no way to know what data you actually have.
This article is aimed at practitioners building or assessing a data governance program at the site or enterprise level. The technical controls are covered elsewhere, audit trail design, data lifecycle management. What’s covered here is the organizational infrastructure that makes those controls work at scale.
What Data Governance Is (and Isn’t)
Data governance is not the same as data integrity. Data integrity is a property, a characteristic of the data itself. Data governance is the program that ensures that property is achieved and sustained across all GxP systems, over time, despite system changes, personnel changes, and organizational growth.
A data governance program answers three questions:
- What GxP data do we have, and where does it live?
- Who is responsible for its integrity at each stage of its lifecycle?
- How do we ensure the controls remain effective as systems and processes evolve?
Without answers to these questions, data integrity is a reactive discipline, you fix problems after inspectors find them, rather than preventing them. Companies that pass difficult DI inspections have systematic answers to all three. Companies that fail usually can’t answer the first one: they don’t have a current, complete inventory of their GxP systems.
The System Inventory
The starting point for any data governance program is knowing what you have. This sounds trivial. It isn’t.
In a mid-size pharmaceutical operation, there may be 40–80 computerized systems that touch GxP data in some way, instruments with embedded software, standalone laboratory informatics systems, manufacturing execution systems, quality management systems, document management platforms, enterprise resource planning modules, and clinical data systems. Some of these are formally validated; others have been running for years in a semi-validated state; others are informal tools that analysts use without IT visibility.
A system inventory should capture for each system:
- System name and version
- Business function and the GxP activities it supports
- Data types generated (results, batch records, deviations, approvals, etc.)
- Validation status and date of last qualification
- Data criticality tier (see below)
- System owner (individual, not a team)
- Location of audit trail data and retention configuration
- Interfaces to other systems
Building this inventory for the first time is time-consuming. Keeping it current requires a process, specifically, it needs to be part of the change control procedure so that any new system, significant upgrade, or decommissioning event triggers an inventory update.
Data Criticality Tiering
Not all GxP data carries the same risk. Applying the same level of oversight to a room temperature monitoring system as to a lot release system misallocates resources. A criticality-based tier model allows proportionate controls.
A common three-tier model:
Tier 1, Critical: Data that directly supports lot release, regulatory submission, product safety, or patient identification decisions. Examples: final release assay results, batch manufacturing records, certificate of analysis data, potency results, chain of custody for patient-derived materials. Tier 1 systems receive the highest level of audit trail review, access control, and periodic qualification review.
Tier 2, Supporting: Data that indirectly supports GxP decisions or is required for regulatory documentation but does not directly determine lot disposition. Examples: in-process monitoring records, equipment calibration records, environmental monitoring trends, training records. Tier 2 systems receive periodic audit trail review and standard access controls.
Tier 3, Peripheral: Data used for operational efficiency or trending but not required for any individual GxP decision. Examples: utility monitoring, non-GxP administrative records, planning tools. Tier 3 systems receive baseline controls proportionate to their regulatory context.
This tiering should be documented, reviewed by QA, and linked to the control requirements for each tier. The rationale for each system’s tier assignment is part of the governance record.
Data Ownership Model
For every system in Tier 1 and Tier 2, there should be a named individual who owns the data integrity of that system. Not a team, not a department, a person.
The data owner is accountable for:
- Ensuring the system’s audit trail is configured correctly and remains configured correctly through changes
- Ensuring periodic audit trail review occurs on schedule
- Escalating access control issues (shared accounts, inappropriate privileges)
- Managing the system’s data lifecycle (retention periods, archival, decommissioning)
- Coordinating with IT on system changes that could affect data integrity controls
The data owner is not necessarily the system administrator or the power user. They’re the person who understands the GxP criticality of the system and is accountable for its data quality. In practice, this is often the department head who relies most heavily on the system, analytical QC director for a CDS, manufacturing lead for an MES.
Ownership assignments should be documented, reviewed annually (or when personnel change), and maintained in the system inventory.
Data Flow Mapping
A data flow map documents how data moves from system to system, where it originates, what transformations it undergoes, and where it ends up. This is the tool that makes data transfer risks visible.
For a typical analytical result, the data flow might look like:
Instrument (raw acquisition) → CDS (integration/calculation) →
LIMS (result storage/specification check) →
Batch Record (summary reporting) →
Regulatory Submission (data package compilation)
Each arrow in that diagram is a transfer point. At each transfer point, there are integrity risks: truncation, reformatting, loss of metadata, manual transcription error. A data flow map makes these risks explicit so that controls can be applied at each point.
Data flow mapping is required implicitly by the FDA DI guidance, which expects companies to understand “where data is generated, how it is transferred, where it is stored, and who has access to it.” It is also the foundation for a DI risk assessment, you can’t assess risk in a system you haven’t mapped.
DI Risk Assessment Methodology
A DI risk assessment evaluates each system (or each data type within a system) against the likelihood that integrity could be compromised and the consequence if it is. It produces a prioritized list of gaps and remediation actions.
A practical approach using a simplified version of the ICH Q9(R1) risk framework:
Risk factors to evaluate:
| Factor | Questions to ask |
|---|---|
| Data criticality | What decisions depend on this data? What is the consequence of a wrong decision? |
| Access controls | Who can create, modify, or delete data? Are shared accounts in use? |
| Audit trail completeness | Does the trail capture prior values? Who can clear or disable it? |
| System maturity | Is the system validated? How current is the validation? |
| Interface controls | How is data transferred to and from this system? Are transfers verified? |
| Retention adequacy | Will data be readable and retrievable for the required period? |
Score each factor (e.g., 1–3) and combine into an overall risk rating. The rating determines the priority for remediation and the level of oversight the system requires.
The risk assessment should be reviewed at least annually and updated after any significant system change, significant process change, or significant finding at the site or at comparable sites in the industry.
The New System Assessment Process
One of the highest-use governance controls is the pre-deployment assessment: evaluating a new system against data integrity requirements before it goes into GxP use. This prevents the accumulation of systems that were deployed without considering DI requirements and must be remediated later.
A pre-deployment DI assessment covers:
- Does the system have an audit trail? At what level of completeness?
- Can individual user accounts be created and managed?
- How are electronic signatures implemented?
- What are the data storage and retention capabilities?
- What interfaces does the system need, and how will transfer integrity be verified?
- Is there a validated migration path if the system is decommissioned?
- Does the vendor have a GxP/Part 11 compliance documentation package?
This assessment should be part of the user requirements specification (URS) process for any new GxP system. The DI requirements should be written into the URS, and the vendor’s compliance documentation should address them before the system is purchased or contracted.
Organizational Structure
Data governance doesn’t sustain itself. It requires an organizational structure with defined accountability and a routine review cadence.
What works in practice:
A data integrity committee (or its equivalent) that meets quarterly and includes: the site Quality Director or equivalent, department heads for areas with Tier 1 systems, IT leadership, and the data governance lead. The committee reviews the system inventory, the status of outstanding DI findings and remediation, the results of periodic audit trail reviews, and any new system assessments in progress.
A dedicated data governance role, someone whose job includes maintaining the system inventory, coordinating DI risk assessments, tracking remediation, and supporting the committee. This doesn’t have to be a full-time role in smaller organizations, but it needs to be assigned to a specific person with time allocated for it.
What doesn’t work:
Assigning data governance to QA as an add-on responsibility without dedicated time. Treating the system inventory as a one-time project rather than a living document. Conducting DI risk assessments once and never updating them. Creating a data governance policy without establishing the process and accountability that gives it teeth.
Remediation Planning
Most organizations that build a data governance program from scratch will identify gaps. Some will be significant. The remediation planning process needs to be realistic about timeline and resource, and needs to prioritize based on the risk assessment.
Prioritization framework:
- Immediate (30–90 days): Critical control gaps on Tier 1 systems, shared user accounts, disabled audit trails, missing prior value capture on release systems.
- Short-term (3–6 months): Control gaps on Tier 1 systems that require system configuration changes or procedural updates.
- Medium-term (6–18 months): Control gaps on Tier 2 systems, legacy system upgrades or replacements, data migration from unsupported formats.
- Long-term (18+ months): Enterprise architecture changes, system consolidations, validated interfaces replacing manual transfers, enhanced reporting infrastructure.
The remediation plan should be documented, approved by QA and site leadership, and maintained as a living document with status tracking. If an inspection occurs during a remediation in progress, a documented plan with clear milestones and accountability is a meaningful mitigating factor. An undocumented remediation effort is not.
The Governance Audit
A useful self-assessment tool is the governance audit: periodically walking through the data flow for a specific, complete process, say, a full analytical result from instrument acquisition to batch disposition, and evaluating the integrity controls at every step against documented requirements.
This is not the same as a system audit (which evaluates a specific system) or an audit trail review (which evaluates specific records). The governance audit looks at the system as a whole: does data integrity governance actually work end-to-end for a real business process?
The output is a gap analysis and an updated risk assessment. Companies that do this regularly, quarterly for critical processes, annually for supporting processes, find their own problems before regulators do. That’s the entire goal of a mature data governance program.