Privacy-preserving record linkage in justice

From data sharing to intelligence sharing

The central challenge is not data access but identity linkage: how to determine that records from different systems refer to the same individual without revealing the underlying personal information.

This problem has gained prominence under the umbrella of privacy-enhancing technologies (PETs) — a growing set of approaches designed to extract value from data while minimizing exposure of personally identifiable information (PII). As artificial intelligence (AI) adoption accelerates and privacy legislation expands, PETs are rapidly moving from niche research topics to operational necessities.

In the justice context, the implications are profound. Instead of moving raw identity data between agencies or vendors, PET-based approaches allow systems to exchange privacy-enhanced representations of identity attributes. These mathematical representations preserve the properties needed for matching, while not exposing the original identifiers.

EY Unified Justice Platform

Addresses data sharing and integration challenges while empowering agencies to navigate the criminal justice continuum.
Read more

How privacy-preserving record linkage works

Privacy-preserving record linkage relies on three complementary methods.

First, identity attributes are transformed at the source. Names, addresses, identifiers and contact information are converted into encrypted and de identified representations before they ever leave an agency’s control. Multiple representations of the same attribute can coexist, each optimized for a different type of matching scenario — high precision identifiers, imprecise free-text fields or phonetic similarities. While deterministic, irreversible encryption forms the base, it’s not enough for the problem space; encrypted values of “David” and “Dave” may be very far apart, thus hindering matches by similarity. That’s where locally sensitive hashing comes in; identity attributes are encrypted such that the vector distance between encrypted values is roughly proportional to the difference in inputs.

Second, matching occurs without decryption. Using vector algebra and symbolic logic, systems can evaluate whether two protected records are likely to refer to the same individual. Because the process relies on deductive rules rather than probabilistic inference, the results are repeatable, explainable and auditable — qualities that are especially important in justice environments.

Third, relationships are modeled as graphs that exhibit the commutative property. If record A links to B, and B links to C, the system infers that A links to C, thus detecting broader identity clusters even when individual records are sparse or incomplete. Crucially, the logic used to establish each link remains transparent and is inscribed on the edges of the graph.

Why determinism and explainability matter

Many modern data systems lean heavily on probabilistic models and machine learning. While powerful, these approaches can introduce opacity, drift and challenges around explainability. In the language of mathematics, any approach that uses inductive logic — drawing general conclusions from specific instances — can never be “truth preserving.” And truth is an important word in justice.

In contrast, deterministic, rule-based linkage is “truth preserving” and provides repeatable, explainable results. Analysts can add additional rules as they learn more about the data. They can author different rulesets for different matching confidence levels. Every matching decision can be traced back to the exact predicates that produced it. This matters not just for technical teams but also for governance, legal review and public trust.

Unlocking value without increasing risk

When identity linkage no longer requires sharing raw PII, new possibilities emerge.

Agencies can generate cross-jurisdictional metrics without exposing personal identities. Situational awareness can be enhanced through event-based notifications that do not require the exposure of identity information. Third-party analytics and cloud environments can be leveraged without expanding breach risk or liability.

Perhaps most importantly, agencies retain control over their own confidentiality posture. Different organizations can apply different privacy transformations to the same types of data, without needing to coordinate or disclose their methods. Intelligence is shared, but identity remains protected.

A shift in mindset

Privacy-preserving record linkage is more than a technical solution — it represents a shift in how institutions think about data collaboration. In an era of accelerating policy change, rising public scrutiny and expanding analytical ambition, that shift may prove essential. The future of justice data is not about choosing between insight and privacy. It is about designing systems that deliver both — by design, not by exception.

Services

Spotlight

Industries

Case studies

Insights

Trending topics

Insights Spotlight

C-Suite Spotlight

About us

Top news

Careers

Spotlight

The case for privacy-preserving record linkage in justice

Justice agencies no longer have to choose between collaboration and privacy as new technologies enable insight without identity exposure.

In brief

From data sharing to intelligence sharing

How privacy-preserving record linkage works

Why determinism and explainability matter

Unlocking value without increasing risk

A shift in mindset

Summary